Add hashes for context and vocab files. #116

msporny · 2023-07-15T20:40:03Z

This PR addresses issue #115 by normatively referring to the appropriate JSON-LD Context files and vocabulary documents. This is similar to the language in the VC v2.0 specification:

https://w3c.github.io/vc-data-model/#base-context

... but follows @jyasskin's suggestion on treating values as already dereferenced. Where it makes sense, cryptographic hashes are provided and implementers are instructed to NOT load the documents from the Web, but rather treat them as already dereferenced with the contents of the files identified by cryptographic hash.

@iherman, pay particular attention to the cryptographic hash value for the machine-readable vocabulary and the normative statement related to it. If you are ok with that approach here, I can do something similar for the VC v2.0 specification.

Preview | Diff

iherman · 2023-07-16T08:54:52Z

index.html

+            <tr>
+              <td>
+https://w3id.org/security#<br>
+text/html
+              </td>
+              <td>
+https://w3c.github.io/vc-data-integrity/vocab/security/vocabulary.html
+              </td>
+            </tr>


I am a bit bothered by the inclusion of the html text; it may put us on a slippery ground. We may get into an argument (within and outside the WG) on whether the HTML specification itself (ie, its content) is normative or not. Because if it is, then it should be a bona fide HTML recommendation, to go through a WG phase, published on /TR, etc. This could have been a proper approach (using the approach of the Web Annotation Recommendations, see also w3c/vc-data-model#1103 (comment)) but it is too late for that.

I would bypass this issue by leaving this line in the table completely out. Let just refer to the JSON-LD version of the vocabulary (which does not imply additional prose and header and all that fluff). If we want to be very LD friendly, we can also refer to the Turtle version in parallel, as an equivalent representation, but it is not really necessary either.

I can remove it, but I thought we had consensus to make the HTML files normative. For example, this was accepted into the VCDM spec:

https://w3c.github.io/vc-data-model/#vocabularies

We would have to reverse that PR if we are not going to include the HTML files as normative. We would also have to create a normative section in each spec defining the base classes, which we could do, but we'd effectively be doing that in two places... once in the spec, and again in the vocabulary document (which is do-able, but a bit awkward).

I'm fine w/ including the .ttl file as well, as long as it is always auto-generated from the same source the .jsonld file is, which it is and I don't see that changing.

I have given some thought on how to change the table to make it maybe simpler and also avoid the possible process pitfalls. It would also concentrate on the main message.

I propose to replace the second table (and the text leading up to it) with something like (salt and pepper at taste...):

The security vocabulary terms, that the JSON-LD contexts listed above resolve to, are in the https://w3id.org/security# namespace, i.e., all security terms in this vocabulary are of the form https://w3id.org/security#term.

When dereferencing the https://w3id.org/security URL, the data returned depends on HTTP content negotiations. These are as follows:

Media type Hash value of the content Remark

text/html … Vocabulary in HTML+RDFa[[HTML-RDFA]]

application/ld+json … Vocabulary in JSON-LD[[JSON-LD]]

text/turtle … Vocabulary in Turtle[[Turtle]]

I do not think the issue following the current table is necessary, because the fact that the vocabulary is on github (or not) is not relevant at this point; it is all a matter of context negotiation on the vocabulary URL. As for the example command line, it should be:

curl -H "Accept: <MEDIA TYPE>" https://w3id.org/security | openssl dgst -<DIGEST_ALGORITHM> -binary | openssl base64 -nopad -a.

The goal is to make it clearer that the HTML version is not defining anything per se, it is "just" one of three possible encodings of the vocabulary. Per w3c/vc-data-model#1061 the HTML file will have to be reworked so that it "just" refers back to the Security and VCDM specifications, because that is really where the specification of the terms occur.

Note: the conneg is not yet fully implemented on w3id.org at the moment; this is still to be done. So is the simplification of the current vocabulary.

Fixed in b24a862.

iherman · 2023-07-16T09:03:51Z

index.html

+to `https://www.w3.org/ns/security/` or an equally normative and archived
+location under W3C control.
+        </p>
+


I am a bit puzzled by this remark, and I am not sure where we stand right now. Do we plan to:

Set up https://www.w3.org/ns/security/vocabulary.{jsonld,ttl,html}, but continue using w3id.org`? Or

Same as above, but removing w3id.org from the equation?

In both cases the /ns/security is good to have because it may serve as a final resting place for the vocabulary files once the development is over, giving it the same level of 'rank' as /TR for the HTML versions. The question is the future role and usage of w3id.org

(Obviously, this remark is not a show-stopper for the PR, although the note may also say that the w3id.org usage may also change if that is the intention.)

Editor's change: I think I was wrong in saying that /ns is an option in all cases. See #116 (comment) below.

(Obviously, this remark is not a show-stopper for the PR, although the note may also say that the w3id.org usage may also change if that is the intention.)

That's not the intention. The intention was to say that "the official version is under W3C control and any URL starting with https://w3id.org/security MUST be treated as resolving to the URL hosted by W3C". That is how we can ensure that w3id.org URLs can be used for redirection as things are being incubated, and then standardized without creating disruption to already deployed systems. It's a design pattern that's meant to not create disruption as things are standardized vs. the disruption we'd cause by changing these URLs now.

Removing w3id.org from the equation at this point would break production deployments and already deployed global standards from other standards bodies, such as 1EdTech (CLRv2 and OpenBadges), and Conexxus (Age Verification standard). See these JSON-LD context files (used by finalized standards):

https://purl.imsglobal.org/spec/clr/v1p0/context/clr_v1p0.jsonld (note use of w3id.org URLs and the security vocabularies)
https://convenience-org.github.io/age-verification-context/contexts/age-v1.jsonld (note use of w3id.org URLs and the security vocabularies)

We might be able to do it for terms that are not in production yet (like some of the status list properties), but proof, multibase, digestMultibase, and publicKeyMultibase are already in production (and in use by other standards organizations), so changing them at this point would be disruptive.

@msporny, understood. However, the /ns namespace is usually used as the public, final URL of a vocabulary. I.e., and in spite of my comment above (I was wrong), I do not think option (1) is a good way forward. If the vocabulary URL continues to be w3id.org then the physical storage of the vocabularies themselves is not really of importance: we can leave them on github while the development is happening, and we can store them on the W3C date space later to provide long term security. Put it another way, w3id.org and www.w3.org/ns/security/ are mutually exclusive, imho.

iherman · 2023-07-16T09:06:37Z

index.html

+              <td style="white-space: nowrap;">
+https://www.w3.org/ns/multikey/v1<br>
+application/ld+json
+              </td>


First time I see this URL... AFAIK, multikey is not a W3C specification, so what would this file contain?

First time I see this URL... AFAIK, multikey is not a W3C specification, so what would this file contain?

Multikey is defined here:

https://w3c.github.io/vc-data-integrity/#multikey

and is in the Security vocabulary here:

https://w3id.org/security#Multikey

It would contain the contents of this file (for anyone that would want to just include Multikey):

https://w3id.org/security/multikey/v1

I missed it, sorry. Not sure if we need a separate context for this or whether we can simply merge this with the security context, but that is a separate discussion.

We need a separate context for those that just want to pull the key format into their JSON-LD file. That doesn't mean we can't /also/ put the definitions in the data-integrity context, but we should break just the key context out for those that just want to import the key definitions.

I am fine with the "also" :-).

jyasskin · 2023-07-16T19:08:57Z

index.html

 timestamps to local time values are expected to consider the time zone
 expectations of the individual. See
-<a data-cite="VC-DATA-MODEL-2.0#representing-time"></a> for more details about
+<a data-cite="?VC-DATA-MODEL-2.0#representing-time"></a> for more details about


I haven't checked, but Respec should be smart enough to notice that this is inside a Note and so put it in the informative references section even if you don't add the ?. If it's not doing that, we should file a bug.

Yes, I thought the same and was surprised when the informative reference jumped to the normative references in the appendix. At that point, I scanned through the document and manually set all references to informative (that were supposed to be informative references)... and the problem went away.

I didn't then go back and try to reproduce it by removing the "?" characters from data-cite. I guess I'll try that now as I have the spec up in another window...

Hmm, so, it worked... I tried two things:

Set the above manually in data-cite and it moved it from normative to informative.

Set the section as class="informative" and it moved it from normative to informative.

So, ReSpec seems to be operating as expected. I think the issue might be that the appendix is marked as normative, when it should be informative... so, I'll fix that when I go back through and clean up this PR.

Thanks for the nudge to track down the issue @jyasskin.

jyasskin · 2023-07-16T19:17:21Z

index.html

+        <p>
+Implementations that perform JSON-LD processing MUST treat the following
+JSON-LD context URLs as already resolved, where the resolved document matches
+the corresponding hash values below:


IIUC, these hashes are here to defend against the possibility that the URL starts resolving to a different document during the lifetime of the standard. But if it does that, implementers should still be able to implement based on the contents of this document, and having only a hash will make that difficult. Can you embed the expected document into this spec, maybe as a set of appendices?

Yes, your understanding is correct. It's even less of a problem w/ the HTML files as those are the human-readable versions of the machine-readable JSON-LD documents. For this specific section, it's the JSON-LD ones that matter, but only if you're doing some advanced RDF processing. So, @iherman is arguing that we should just not list the HTML files and only include the JSON-LD ones.

We could include these files in the specification, but these documents are BIG, and they're going to be archived at W3C at date-stamped URLs, and in the spec repositories on Github, and in the Arctic code vault. The likelihood of them going away completely is exceedingly small, so putting them in the spec feels a bit like overkill. We could do it, but people are already complaining about the length of the spec (which shouldn't matter, but people seem to associate a negative outlook with the technology when the spec gets big). @iherman has suggested that maybe we could hide everything in a details element (which we could).

All this to say, it's an active area of discussion (do we embed stuff in the spec, or refer to an external file (that's archived at W3C) and use a cryptographic hash and URL to refer to it).

Interested in your thoughts, @jyasskin, given the above.

I don't know JSON-LD or RDF well, so this could be wrong, but I think that we can make a distinction between the contexts and the documents with RDFa data in them. To process JSON-LD at all—to get the IDs right—you have to process the context. The RDFa information, on the other hand, is only for if you want to situate the credential into a wider semantic web, right? That seems to imply that this specification should include the contexts but omit the RDFa. The contexts seem short enough to embed without making the spec overly long?

I don't know JSON-LD or RDF well, so this could be wrong, but I think that we can make a distinction between the contexts and the documents with RDFa data in them. To process JSON-LD at all—to get the IDs right—you have to process the context. The RDFa information, on the other hand, is only for if you want to situate the credential into a wider semantic web, right? That seems to imply that this specification should include the contexts but omit the RDFa. The contexts seem short enough to embed without making the spec overly long?

I do not think we were ever considering including the RDFa. Instead, we were considering the JSON-LD representation of the vocabulary; in this respect, it is more akin to the context file. And, actually, they are of more or less equal length.

(Maybe you misspelled, and you meant RDF and not RDFa, @jyasskin?)

The contexts seem short enough to embed without making the spec overly long?

Yes, we can also do this, but I question whether we need to... it presumes that w3.org goes away and, simultaneously, the entire development ecosystem goes away, or that people don't also have this information (the JSON-LD Contexts) copied statically in a variety of different software libraries (because we tell them to do just that in the spec).

The reality is that w3.org is maintained and isn't going away any time soon. If it /does/ go away, we have the Github repo. If Github goes away we have the Arctic Code Vault and the Library of Congress as backups... and if all of those things go away, we have all the developers that have backed up the context in their libraries. I guess the cost isn't much to place it in our spec, and we can include it in a collapsed details element... but that presumes such a catastrophic series of events that I question adding yet another redundancy for the many we already have in place.

All this to say, are we really that worried about the w3.org going away, and all the software repositories that have a backup for the context file going away, in such a way that makes it impossible to re-create the JSON-LD context? Feels like we're preparing for an extremely unlikely event.

jyasskin · 2023-07-16T19:22:05Z

index.html

+text/html
+              </td>
+              <td>
+https://w3c.github.io/vc-data-integrity/vocab/security/vocabulary.html


I think the w3id.org URLs are only used as IDs, and when I tried to read the JSON-LD spec, I couldn't find anywhere that the document they resolve to had any normative impact. Did I misread that, and implementers actually have to care about the content? If they don't have to care, you can probably omit the content URL.

Yes, correct, the w3id.org URLs are only used as IDs.

The document they resolve to might have more machine-readable information at it that can be used by an RDF reasoner... but that's not a requirement. The only requirement is that it establishes a globally unambiguous identifier.

In the case of vc-data-model and vc-data-integrity, we /do/ put machine-readable information at the end of the IDs, and we also state explicitly what should be at the end of those URLs and that implementations must treat the URL as resolved (per your guidance, @jyasskin), thus ensuring that the identifiers resolve to the machine-readable information that the WG defined.

As for w3id.org, it exists as a redirection service (for over 670+ projects). One usage of the service is so that we can establish stable identifiers for vocabulary documents while they're being incubated in W3C CGs, and continue to use those identifiers as the specification goes from experimentation, to incubation, to adoption by an official WG, all the way to a W3C REC. As a specification goes through those processes (and migrates from an organization, to a community group, to a working group, to a TR) we update the w3id.org redirect to the new home/location of the specification over time.

This has helped us not disrupt organizations that are deploying this stuff to production on timelines that are faster than the W3C standards process, which is the case for a variety of production deployments of Verifiable Credentials and Data Integrity.

Yeah, if the meaning of a credential could depend on the content behind a w3id.org URL, I'd worry more about who's hosting and controlling that content, but since it's just optional extra RDF triples that don't affect the things this specification defines, I don't think it's even necessary to pin those documents with a hash.

If the WG does want to nail down the set of triples, one option would be to put the triples in an appendix, and say that these URLs can be assumed to return content that embeds the triples, without saying anything about the human-readable prose around them. Again, I don't see a need to do that, but it could be a way to specify the stuff that programs can read without dramatically increasing the size of this document.

If the WG does want to nail down the set of triples, one option would be to put the triples in an appendix, and say that these URLs can be assumed to return content that embeds the triples, without saying anything about the human-readable prose around them.

Just to understand what you mean: isn't this saying that the vocabulary, in JSON-LD, should be in the appendix? The vocabulary is a bunch of triples after all...

@jyasskin wrote:

Yeah, if the meaning of a credential could depend on the content behind a w3id.org URL, I'd worry more about who's hosting and controlling that content, but since it's just optional extra RDF triples that don't affect the things this specification defines, I don't think it's even necessary to pin those documents with a hash.

Yes, correct, the vocabulary content behind a w3id.org URL is not processed at all during runtime. We're just providing the information in the case someone wants to do formal verification or post-processing of some kind after-the-fact. If someone were to want to process that stuff during runtime (I don't know why -- perhaps for semantic linting purposes, but presuming they do), then they can just use the static machine-readable file (referred to by hash and possibly embedded in the spec) to do so.

If the WG does want to nail down the set of triples, one option would be to put the triples in an appendix, and say that these URLs can be assumed to return content that embeds the triples, without saying anything about the human-readable prose around them. Again, I don't see a need to do that, but it could be a way to specify the stuff that programs can read without dramatically increasing the size of this document.

Yeah, there are a number of workable ways we could do this... we just need to pick one and go and do it. Publishing the JSON-LD representation of the vocabulary is probably the right balance. We could also publish the TURTLE representation as well given that both files are auto-generated from the vocabulary.yml file. This is effectively "publishing the triples" as you noted above, @jyasskin.

Wind4Greg

Important. Good to see this.

TallTed

Just a singular/plural match between a subject and verb.

index.html

Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>

msporny · 2023-07-28T20:53:46Z

Normative, multiple reviews, changes requested and made, no objections, merging.

msporny added 4 commits July 15, 2023 16:23

Add section on Contexts with hashes.

6b4eda6

Add section on vocabulary hashes.

f96f347

Add instructions on generating digests for contexts and vocabs.

67ebcf5

Refer to section on using cached contexts/vocabs in VCDM v2.0.

84535fe

msporny requested review from Sakurann, TallTed, Wind4Greg, brentzundel, davidlehn, dlongley, dmitrizagidulin, iherman, peacekeeper and philarcher July 15, 2023 20:40

msporny mentioned this pull request Jul 15, 2023

Add guidance on perma-caching contexts to align with VCDM #115

Closed

Ensure references to VC-DATA-MODEL-2.0 are informative.

d4bd6c7

iherman requested changes Jul 16, 2023

View reviewed changes

dlongley approved these changes Jul 16, 2023

View reviewed changes

jyasskin reviewed Jul 16, 2023

View reviewed changes

Wind4Greg approved these changes Jul 17, 2023

View reviewed changes

TallTed suggested changes Jul 18, 2023

View reviewed changes

index.html Outdated Show resolved Hide resolved

iherman mentioned this pull request Jul 21, 2023

Vocabulary pruning and improvement phase 1 #127

Merged

msporny and others added 2 commits July 21, 2023 09:11

Fix grammar in section on context and vocabularies.

a9bdac0

Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>

Update security vocab hash statements as requested by @iherman.

b24a862

msporny requested a review from seabass-labrax as a code owner July 28, 2023 20:51

msporny merged commit 81df3ef into main Jul 28, 2023

msporny deleted the msporny-add-hashes branch July 28, 2023 20:54

Media type	Hash value of the content	Remark
text/html	…	Vocabulary in HTML+RDFa[[HTML-RDFA]]
application/ld+json	…	Vocabulary in JSON-LD[[JSON-LD]]
text/turtle	…	Vocabulary in Turtle[[Turtle]]

Add hashes for context and vocab files. #116

Add hashes for context and vocab files. #116

Uh oh!

Conversation

msporny commented Jul 15, 2023 • edited by pr-preview bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

iherman Jul 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

msporny Jul 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

msporny Jul 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

msporny Jul 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Wind4Greg left a comment

Choose a reason for hiding this comment

Uh oh!

TallTed left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

msporny commented Jul 28, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

msporny commented Jul 15, 2023 •

edited by pr-preview bot

Loading

iherman Jul 16, 2023 •

edited

Loading

msporny Jul 16, 2023 •

edited

Loading

msporny Jul 18, 2023 •

edited

Loading

msporny Jul 18, 2023 •

edited

Loading