Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds the "input blank node identifier map" #100
Adds the "input blank node identifier map" #100
Changes from 9 commits
4d65b09
b76ebdb
a9fa5e3
8b7a3d3
0d96146
10563be
f957b13
1b95887
10f6c40
7675b1f
61eecaf
a8e7f01
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about saying that the output of the c14n (single and deterministic) is the serialized form, whereas the normalized dataset (possibly non-deterministic) can be obtained as an auxiliary output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, would change "Optionally" to "As an auxiliary output" satisfy this? I don't really follow how this is non-deterministic, as it would seem that with a given input, the same normalized dataset would be produced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather not be so prescriptive in saying that you have to return both -- I'd be happy with a both / either. We don't want to force implementations to do extra work they don't need to.
I also don't think we should say what implementations can do as a proxy for indicating that two different implementations might technically map one blank node to ID A and another implementation might map it to ID B. This is @yamdan's point, I believe -- but this only happens when there are isomorphisms that make this difference irrelevant. A serialized version of the dataset would look the same. We should just say this, not impose restrictions on implementations.
Perhaps that's what we say in a note: "Technically speaking, one implementation might map particular blank nodes to different identifiers than another implementation, however, this only occurs when there are isomorphisms in the dataset such that a serialized expression of the dataset would appear the same from either implementation."
And then we can say that algorithms may return both the canonically serialized dataset and the normalized dataset or either of these as requested by the invoker of the algorithm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be some time before I can do this update, but it seems simple enough. I'm traveling for the next week, and internet access is spotty. Feel free to update and commit, as this is really just informative, now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a suggestion below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gkellogg,
Non-deterministic choice may occur in step 5.3 of the 4.4.3 Algorithm, where it's possible to have ties of result in the hash path list so that it's non-deterministic which result is firstly chosen from the list. (see the debug log from my implementation)
Even the same implementation can output different canonical issuers depending on the runtime environment or the input blank node identifiers.
The only thing I would like to eliminate here is the possibility of creating a misuse of the normalized dataset, believing that it is a deterministic and single canonical result and connecting it to the hash or signature input.
I think we can prevent this by clearly stating that the serialized form is the output of the canonicalization and the normalized dataset is an auxiliary output.
As @dlongley mentioned, I think this only happens when there are automorphisms in the input dataset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will continue this "non-deterministic" topic in a new separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to note that the normalized dataset should not be interpreted as a single canonical representation because the algorithm can output different canonical issuers depending on the implementation or runtime environment. (#89 (comment))
For example, an input dataset
can be transformed into the normalized dataset with either one of the following canonical issuers, depending on the implementation:
{ "e0": "c14n0", "e1": "c14n1" }
{ "e0": "c14n1", "e1": "c14n0" }
Both canonical issuers result in the same single serialized form:
So, we can only say that serialized form is a single canonical representation, but the normalized dataset is possibly not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't say that the normalized dataset is a single canonical representation; as you point out, the association of blank nodes to input identifiers could be different for two otherwise isomorphic datasets, and therefor the map from input identifier to canonical identifier would differ. Note that this is in a non-normative explanation detail. Is there some specific text you'd like to add or change?