-
Notifications
You must be signed in to change notification settings - Fork 21
Add hashes for context and vocab files. #116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| <tr> | ||
| <td> | ||
| https://w3id.org/security#<br> | ||
| text/html | ||
| </td> | ||
| <td> | ||
| https://w3c.github.io/vc-data-integrity/vocab/security/vocabulary.html | ||
| </td> | ||
| </tr> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit bothered by the inclusion of the html text; it may put us on a slippery ground. We may get into an argument (within and outside the WG) on whether the HTML specification itself (ie, its content) is normative or not. Because if it is, then it should be a bona fide HTML recommendation, to go through a WG phase, published on /TR, etc. This could have been a proper approach (using the approach of the Web Annotation Recommendations, see also w3c/vc-data-model#1103 (comment)) but it is too late for that.
I would bypass this issue by leaving this line in the table completely out. Let just refer to the JSON-LD version of the vocabulary (which does not imply additional prose and header and all that fluff). If we want to be very LD friendly, we can also refer to the Turtle version in parallel, as an equivalent representation, but it is not really necessary either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can remove it, but I thought we had consensus to make the HTML files normative. For example, this was accepted into the VCDM spec:
https://w3c.github.io/vc-data-model/#vocabularies
We would have to reverse that PR if we are not going to include the HTML files as normative. We would also have to create a normative section in each spec defining the base classes, which we could do, but we'd effectively be doing that in two places... once in the spec, and again in the vocabulary document (which is do-able, but a bit awkward).
I'm fine w/ including the .ttl file as well, as long as it is always auto-generated from the same source the .jsonld file is, which it is and I don't see that changing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have given some thought on how to change the table to make it maybe simpler and also avoid the possible process pitfalls. It would also concentrate on the main message.
I propose to replace the second table (and the text leading up to it) with something like (salt and pepper at taste...):
The security vocabulary terms, that the JSON-LD contexts listed above resolve to, are in the
https://w3id.org/security#namespace, i.e., all security terms in this vocabulary are of the formhttps://w3id.org/security#term.When dereferencing the
https://w3id.org/securityURL, the data returned depends on HTTP content negotiations. These are as follows:
Media type Hash value of the content Remark text/html … Vocabulary in HTML+RDFa[[HTML-RDFA]] application/ld+json … Vocabulary in JSON-LD[[JSON-LD]] text/turtle … Vocabulary in Turtle[[Turtle]]
I do not think the issue following the current table is necessary, because the fact that the vocabulary is on github (or not) is not relevant at this point; it is all a matter of context negotiation on the vocabulary URL. As for the example command line, it should be:
curl -H "Accept: <MEDIA TYPE>" https://w3id.org/security | openssl dgst -<DIGEST_ALGORITHM> -binary | openssl base64 -nopad -a.
The goal is to make it clearer that the HTML version is not defining anything per se, it is "just" one of three possible encodings of the vocabulary. Per w3c/vc-data-model#1061 the HTML file will have to be reworked so that it "just" refers back to the Security and VCDM specifications, because that is really where the specification of the terms occur.
Note: the conneg is not yet fully implemented on w3id.org at the moment; this is still to be done. So is the simplification of the current vocabulary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in b24a862.
index.html
Outdated
| to `https://www.w3.org/ns/security/` or an equally normative and archived | ||
| location under W3C control. | ||
| </p> | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit puzzled by this remark, and I am not sure where we stand right now. Do we plan to:
- Set up
https://www.w3.org/ns/security/vocabulary.{jsonld,ttl,html}, but continue usingw3id.org`? Or - Same as above, but removing
w3id.orgfrom the equation?
In both cases the . The question is the future role and usage of /ns/security is good to have because it may serve as a final resting place for the vocabulary files once the development is over, giving it the same level of 'rank' as /TR for the HTML versionsw3id.org
(Obviously, this remark is not a show-stopper for the PR, although the note may also say that the w3id.org usage may also change if that is the intention.)
Editor's change: I think I was wrong in saying that /ns is an option in all cases. See #116 (comment) below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Obviously, this remark is not a show-stopper for the PR, although the note may also say that the w3id.org usage may also change if that is the intention.)
That's not the intention. The intention was to say that "the official version is under W3C control and any URL starting with https://w3id.org/security MUST be treated as resolving to the URL hosted by W3C". That is how we can ensure that w3id.org URLs can be used for redirection as things are being incubated, and then standardized without creating disruption to already deployed systems. It's a design pattern that's meant to not create disruption as things are standardized vs. the disruption we'd cause by changing these URLs now.
Removing w3id.org from the equation at this point would break production deployments and already deployed global standards from other standards bodies, such as 1EdTech (CLRv2 and OpenBadges), and Conexxus (Age Verification standard). See these JSON-LD context files (used by finalized standards):
https://purl.imsglobal.org/spec/clr/v1p0/context/clr_v1p0.jsonld (note use of w3id.org URLs and the security vocabularies)
https://convenience-org.github.io/age-verification-context/contexts/age-v1.jsonld (note use of w3id.org URLs and the security vocabularies)
We might be able to do it for terms that are not in production yet (like some of the status list properties), but proof, multibase, digestMultibase, and publicKeyMultibase are already in production (and in use by other standards organizations), so changing them at this point would be disruptive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@msporny, understood. However, the /ns namespace is usually used as the public, final URL of a vocabulary. I.e., and in spite of my comment above (I was wrong), I do not think option (1) is a good way forward. If the vocabulary URL continues to be w3id.org then the physical storage of the vocabularies themselves is not really of importance: we can leave them on github while the development is happening, and we can store them on the W3C date space later to provide long term security. Put it another way, w3id.org and www.w3.org/ns/security/ are mutually exclusive, imho.
| <td style="white-space: nowrap;"> | ||
| https://www.w3.org/ns/multikey/v1<br> | ||
| application/ld+json | ||
| </td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First time I see this URL... AFAIK, multikey is not a W3C specification, so what would this file contain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First time I see this URL... AFAIK,
multikeyis not a W3C specification, so what would this file contain?
Multikey is defined here:
https://w3c.github.io/vc-data-integrity/#multikey
and is in the Security vocabulary here:
https://w3id.org/security#Multikey
It would contain the contents of this file (for anyone that would want to just include Multikey):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed it, sorry. Not sure if we need a separate context for this or whether we can simply merge this with the security context, but that is a separate discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a separate context for those that just want to pull the key format into their JSON-LD file. That doesn't mean we can't /also/ put the definitions in the data-integrity context, but we should break just the key context out for those that just want to import the key definitions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with the "also" :-).
| timestamps to local time values are expected to consider the time zone | ||
| expectations of the individual. See | ||
| <a data-cite="VC-DATA-MODEL-2.0#representing-time"></a> for more details about | ||
| <a data-cite="?VC-DATA-MODEL-2.0#representing-time"></a> for more details about |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't checked, but Respec should be smart enough to notice that this is inside a Note and so put it in the informative references section even if you don't add the ?. If it's not doing that, we should file a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I thought the same and was surprised when the informative reference jumped to the normative references in the appendix. At that point, I scanned through the document and manually set all references to informative (that were supposed to be informative references)... and the problem went away.
I didn't then go back and try to reproduce it by removing the "?" characters from data-cite. I guess I'll try that now as I have the spec up in another window...
Hmm, so, it worked... I tried two things:
- Set the above manually in
data-citeand it moved it from normative to informative. - Set the section as
class="informative"and it moved it from normative to informative.
So, ReSpec seems to be operating as expected. I think the issue might be that the appendix is marked as normative, when it should be informative... so, I'll fix that when I go back through and clean up this PR.
Thanks for the nudge to track down the issue @jyasskin.
| <p> | ||
| Implementations that perform JSON-LD processing MUST treat the following | ||
| JSON-LD context URLs as already resolved, where the resolved document matches | ||
| the corresponding hash values below: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, these hashes are here to defend against the possibility that the URL starts resolving to a different document during the lifetime of the standard. But if it does that, implementers should still be able to implement based on the contents of this document, and having only a hash will make that difficult. Can you embed the expected document into this spec, maybe as a set of appendices?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, your understanding is correct. It's even less of a problem w/ the HTML files as those are the human-readable versions of the machine-readable JSON-LD documents. For this specific section, it's the JSON-LD ones that matter, but only if you're doing some advanced RDF processing. So, @iherman is arguing that we should just not list the HTML files and only include the JSON-LD ones.
We could include these files in the specification, but these documents are BIG, and they're going to be archived at W3C at date-stamped URLs, and in the spec repositories on Github, and in the Arctic code vault. The likelihood of them going away completely is exceedingly small, so putting them in the spec feels a bit like overkill. We could do it, but people are already complaining about the length of the spec (which shouldn't matter, but people seem to associate a negative outlook with the technology when the spec gets big). @iherman has suggested that maybe we could hide everything in a details element (which we could).
All this to say, it's an active area of discussion (do we embed stuff in the spec, or refer to an external file (that's archived at W3C) and use a cryptographic hash and URL to refer to it).
Interested in your thoughts, @jyasskin, given the above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know JSON-LD or RDF well, so this could be wrong, but I think that we can make a distinction between the contexts and the documents with RDFa data in them. To process JSON-LD at all—to get the IDs right—you have to process the context. The RDFa information, on the other hand, is only for if you want to situate the credential into a wider semantic web, right? That seems to imply that this specification should include the contexts but omit the RDFa. The contexts seem short enough to embed without making the spec overly long?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know JSON-LD or RDF well, so this could be wrong, but I think that we can make a distinction between the contexts and the documents with RDFa data in them. To process JSON-LD at all—to get the IDs right—you have to process the context. The RDFa information, on the other hand, is only for if you want to situate the credential into a wider semantic web, right? That seems to imply that this specification should include the contexts but omit the RDFa. The contexts seem short enough to embed without making the spec overly long?
I do not think we were ever considering including the RDFa. Instead, we were considering the JSON-LD representation of the vocabulary; in this respect, it is more akin to the context file. And, actually, they are of more or less equal length.
(Maybe you misspelled, and you meant RDF and not RDFa, @jyasskin?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The contexts seem short enough to embed without making the spec overly long?
Yes, we can also do this, but I question whether we need to... it presumes that w3.org goes away and, simultaneously, the entire development ecosystem goes away, or that people don't also have this information (the JSON-LD Contexts) copied statically in a variety of different software libraries (because we tell them to do just that in the spec).
The reality is that w3.org is maintained and isn't going away any time soon. If it /does/ go away, we have the Github repo. If Github goes away we have the Arctic Code Vault and the Library of Congress as backups... and if all of those things go away, we have all the developers that have backed up the context in their libraries. I guess the cost isn't much to place it in our spec, and we can include it in a collapsed details element... but that presumes such a catastrophic series of events that I question adding yet another redundancy for the many we already have in place.
All this to say, are we really that worried about the w3.org going away, and all the software repositories that have a backup for the context file going away, in such a way that makes it impossible to re-create the JSON-LD context? Feels like we're preparing for an extremely unlikely event.
index.html
Outdated
| text/html | ||
| </td> | ||
| <td> | ||
| https://w3c.github.io/vc-data-integrity/vocab/security/vocabulary.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the w3id.org URLs are only used as IDs, and when I tried to read the JSON-LD spec, I couldn't find anywhere that the document they resolve to had any normative impact. Did I misread that, and implementers actually have to care about the content? If they don't have to care, you can probably omit the content URL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, correct, the w3id.org URLs are only used as IDs.
The document they resolve to might have more machine-readable information at it that can be used by an RDF reasoner... but that's not a requirement. The only requirement is that it establishes a globally unambiguous identifier.
In the case of vc-data-model and vc-data-integrity, we /do/ put machine-readable information at the end of the IDs, and we also state explicitly what should be at the end of those URLs and that implementations must treat the URL as resolved (per your guidance, @jyasskin), thus ensuring that the identifiers resolve to the machine-readable information that the WG defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for w3id.org, it exists as a redirection service (for over 670+ projects). One usage of the service is so that we can establish stable identifiers for vocabulary documents while they're being incubated in W3C CGs, and continue to use those identifiers as the specification goes from experimentation, to incubation, to adoption by an official WG, all the way to a W3C REC. As a specification goes through those processes (and migrates from an organization, to a community group, to a working group, to a TR) we update the w3id.org redirect to the new home/location of the specification over time.
This has helped us not disrupt organizations that are deploying this stuff to production on timelines that are faster than the W3C standards process, which is the case for a variety of production deployments of Verifiable Credentials and Data Integrity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, if the meaning of a credential could depend on the content behind a w3id.org URL, I'd worry more about who's hosting and controlling that content, but since it's just optional extra RDF triples that don't affect the things this specification defines, I don't think it's even necessary to pin those documents with a hash.
If the WG does want to nail down the set of triples, one option would be to put the triples in an appendix, and say that these URLs can be assumed to return content that embeds the triples, without saying anything about the human-readable prose around them. Again, I don't see a need to do that, but it could be a way to specify the stuff that programs can read without dramatically increasing the size of this document.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the WG does want to nail down the set of triples, one option would be to put the triples in an appendix, and say that these URLs can be assumed to return content that embeds the triples, without saying anything about the human-readable prose around them.
Just to understand what you mean: isn't this saying that the vocabulary, in JSON-LD, should be in the appendix? The vocabulary is a bunch of triples after all...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jyasskin wrote:
Yeah, if the meaning of a credential could depend on the content behind a w3id.org URL, I'd worry more about who's hosting and controlling that content, but since it's just optional extra RDF triples that don't affect the things this specification defines, I don't think it's even necessary to pin those documents with a hash.
Yes, correct, the vocabulary content behind a w3id.org URL is not processed at all during runtime. We're just providing the information in the case someone wants to do formal verification or post-processing of some kind after-the-fact. If someone were to want to process that stuff during runtime (I don't know why -- perhaps for semantic linting purposes, but presuming they do), then they can just use the static machine-readable file (referred to by hash and possibly embedded in the spec) to do so.
If the WG does want to nail down the set of triples, one option would be to put the triples in an appendix, and say that these URLs can be assumed to return content that embeds the triples, without saying anything about the human-readable prose around them. Again, I don't see a need to do that, but it could be a way to specify the stuff that programs can read without dramatically increasing the size of this document.
Yeah, there are a number of workable ways we could do this... we just need to pick one and go and do it. Publishing the JSON-LD representation of the vocabulary is probably the right balance. We could also publish the TURTLE representation as well given that both files are auto-generated from the vocabulary.yml file. This is effectively "publishing the triples" as you noted above, @jyasskin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important. Good to see this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a singular/plural match between a subject and verb.
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
|
Normative, multiple reviews, changes requested and made, no objections, merging. |
This PR addresses issue #115 by normatively referring to the appropriate JSON-LD Context files and vocabulary documents. This is similar to the language in the VC v2.0 specification:
https://w3c.github.io/vc-data-model/#base-context
... but follows @jyasskin's suggestion on treating values as already dereferenced. Where it makes sense, cryptographic hashes are provided and implementers are instructed to NOT load the documents from the Web, but rather treat them as already dereferenced with the contents of the files identified by cryptographic hash.
@iherman, pay particular attention to the cryptographic hash value for the machine-readable vocabulary and the normative statement related to it. If you are ok with that approach here, I can do something similar for the VC v2.0 specification.
Preview | Diff