Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default vocab for credentials context v2 #753

Closed
tplooker opened this issue Nov 15, 2020 · 64 comments
Closed

Default vocab for credentials context v2 #753

tplooker opened this issue Nov 15, 2020 · 64 comments

Comments

@tplooker
Copy link

tplooker commented Nov 15, 2020

JSON-LD defines a way to specify the default vocabulary for properties and types that otherwise do not match a term definition. Currently the credential context v1 does not make use of this feature, meaning producers of VC's must be explict in their term decelerations for all properties used, otherwise the jsonld signatures signing process will error due to the usage of a strict expansion map that requires all properties to be defined.

If instead a default vocabulary was included in a new revision of the credentials context, then the signing process would proceed and the properties without a formal term definition would be expanded with the default vocab. Take for instance the following example where the credentials context v2 would include the definition of "@vocab" set to "https://www.w3.org/2018/credentials/undefinedTerm"

{
  "@context": [
    "https://www.w3.org/2018/credentials/v2"
  ],
  "type": ["VerifiableCredential"],
  "issuer": "did:example:28394728934792387",
  "issuanceDate": "2019-12-03T12:19:52Z",
  "expirationDate": "2029-12-03T12:19:52Z",
  "credentialSubject": {
    "id": "did:example:b34ca6cd37bbf23",
    "someClaim": "A claim",
    "someOtherClaim": "Another claim"
  },
  "proof": {
     "type": "Ed25519Signature2018",
     "created": "2020-01-30T03:32:15Z",
     "jws": "eyJhbGciOiJFZERTQSIsI...wRG2fNmAx60Vi4Ag",
     "proofPurpose": "assertionMethod",
     "verificationMethod": "did:example:28394728934792387#keys-7f83he7s8"
  }
}

The properties "someClaim" and "someOtherClaim" during signing would be expanded to be https://www.w3.org/2018/credentials/undefinedTerm#someClaim and https://www.w3.org/2018/credentials/undefinedTerm#someOtherClaim which when browsed to would resolve to a page informing the user that the property is not formally defined and perhaps link to some helpful resources on how to define the term formally.

@tplooker
Copy link
Author

Interested in feedback and thoughts from @msporny @dlongley @OR13

@OR13
Copy link
Contributor

OR13 commented Nov 16, 2020

I think defaulting is probably slightly better than dropping properties, but throwing an error remains the best thing to do here.

The question remains, why would you sign anything you don't understand?

If we added @vocab it would need to come with some strong warnings, and I would want pretty much all processors to throw instead of warn...

Right now, using undefined terms is more likely to cause an error.

I'm not sure if this is an improvement but if "undefinedTerm" had a human readable definition that said "don't trust anything from this issuer until they define this term"... I supposed that would be an improvement over what we have today.

@David-Chadwick
Copy link
Contributor

This is not a new problem or issue. LDAP introduced the extensibleObject object class for the same reason, so that undefined attributes could be added to an LDAP entry. Personally I dont like this. I think all VCs that cannot be fully understood should be rejected. This is because, different to LDAP entries, VCs are primarily security tokens. So having unknown information in a security token is dangerous. If you want to address this in a secure way, you should consider doing something like X.509 with its critical extension feature. With this feature, every attribute at the outer level is fully understood, even if an inner value is not. But the outer level says whether you can safely ignore inner values or not.

@OR13
Copy link
Contributor

OR13 commented Nov 16, 2020

@tplooker and I had a chat about this... I can see the advantage of "making the tent bigger" by welcoming folks who don't understand why signing unknown properties might not be a good idea, weakening the security model of VCs if we can add the following to the spec:

https://www.w3.org/2018/credentials/v2 documents MAY contain properties that are defaulted to "https://www.w3.org/2018/credentials/undefinedTerm#someClaim " via the use of @vocab.

A JSON-LD Processor SHOULD throw an error when the @vocab is used.

A JSON-LD Processor MUST warn if no error is thrown and @vocab is used.

https://www.w3.org/2018/credentials/undefinedTerm leads to a definition which warns that the document containing this term cannot be trusted, and is likely experimental or test data.

Reasons this would be better than what we have today.

  1. Including schema.org essentially does this already, and silently defaults undefined terms to https://schema.org/someClaim
  2. Many JSON-LD VC Processors already check for https://www.w3.org/2018/credentials/v1, when they check for https://www.w3.org/2018/credentials/v2 we can require them to handle errors better, and we SHOULD.
  3. Some properties like type are already dropped silently by some (not all) libraries, see Add feature to throw error when unmapped elements are dropped during processing digitalbazaar/jsonld.js#199
  4. We are loosing an ability to protect users from undefined terms through a security in depth:

Throw Error > Default Member and Produce Warning > Default Member > Drop Property and Produce Warning > Drop Property.

We should start at the top, and only compromise when the risk to users is outweighed by the convenience to developers, and we are honest about which constituency we care more about in the spec.

@tplooker
Copy link
Author

Another option we discussed, is for the default vocab to be defined as a security term that in present in both the credentials and security context and therefore making it a concept present at the JSON-LD signatures level not just a verifiable credentials concept (so instead of https://www.w3.org/2018/credentials/undefinedTerm a https://www.w3.org/security/undefinedTerm as the default vocab)

@peacekeeper
Copy link
Contributor

I don't know.. https://www.w3.org/2018/credentials/undefinedTerm#someClaim feels like a contradiction to me.

This sounds like someClaim is "undefined", and I understand this is exactly the intention, but now that term can actually be expanded just fine and there won't be any warning or error, since the default vocabulary makes it all 100% correct JSON-LD, no?

If there's an undefined term in your JSON-LD, the processor should simply throw an error...

@tplooker
Copy link
Author

@peacekeeper I understand that it appears as a contradiction.

To be clear I think in general JSON-LD processors SHOULD at least emit a warning when any property is expanded with @vocab so that this behaviour can be caught regardless.

@OR13
Copy link
Contributor

OR13 commented Nov 17, 2020

If we are talking about a spec where we get to make normative requirements, I would repeat my assertions:

  1. JSON-LD Processors SHOULD throw an ERROR
  2. JSON-LD Processors MUST emit a warning if they don't throw an error

This is a spec related to cryptographic assertions / security... swallowing errors / warning is a terrible idea from a security perspective.... I can understand developers from untyped languages like javascript or python feeling like types are a "burden" and "slow me down / are confusing"... but frankly... they are wrong :)

types and strictness are part of good security engineering... we should not be failing open... we should be failing closed.

https://cwe.mitre.org/data/definitions/636.html

@tplooker
Copy link
Author

tplooker commented Nov 17, 2020

This issue goes right to the core of the extensibility model for verifiable credentials. If we look at neighbouring technologies such as JWT they support what are known as public and private claims where public are recommended to be registered in the IANA registry, private are subject to collision and should be used with caution. At the moment we force users of verifiable credential technology when expressing in JSON-LD to define all their properties in some form in the JSON-LD context (note they are free to use @vocab for this too) and do not allow essentially "private claims". I understand the argument that this is a good natural force in the technology, driving producers to be precise, however I think it negatively harms the usability of the technology and also doesn't always work, e.g if I include schema.org in my context, im free to define any property and it wont error and really have no idea what is going on. If we instead allowed for the "private claims" equivalent via the inclusion of @vocab I think we strike a better balance where producers who want their credentials to use a fully defined vocab can, and those who dont won't and will suffer the consequences of poorly contexualized data.

@tplooker
Copy link
Author

I also dont think a signed property that has been expanded to an IRI of https://www.w3.org/2018/credentials/undefinedTerm#someClaim that when dereferenced clearly states to the user "Hey this property is not formally registered so we dont have a formal definition for you" doesn't mean we are defaulting open from a security perspective IMO

@OR13
Copy link
Contributor

OR13 commented Nov 17, 2020

  1. @vocab being used indicates a potential security issue that at least should yield a warning.
  2. @vocab is useful for folks who like seeing warnings instead of errors in their security applications.

I personally don't like seeing warnings, and I don't like setting additionalProperties:true in JSON Schema or const data: any = {} in typescript.

I don't think we should hand developers a tool which lets them accidentally poison themselves and their users... the recommendation should remain to at least warn, and best practice would be to throw.

I do think making changes to the VC Data Model in this regard would be helpful, but I would formally object to swallowing undefined terms by adding @vocab without a mandatory warning, its like turning off a lint rule that protects against prototype pollution... the only reason to do so, is because the developer is not good at security, and can't figure out how to fix the issue correctly.

@OR13
Copy link
Contributor

OR13 commented Nov 17, 2020

case in point, if I add __proto__ and it gets preserved with a warning instead of a thrown error, and then some system passes that "valid VC" to another verifier systems that is vulnerable, I can actually get remote code execution on the verifier...

https://itnext.io/prototype-pollution-attack-on-nodejs-applications-94a8582373e7#:~:text=The%20Prototype%20Pollution%20attack%20(%20as,Remote%20Code%20Execution%20%E2%80%94%20RCE).

The fact that including schema.org allows for this exploit today, with no changes, does not mean we should normalize this behavior... it means we should patch the VC Data Model spec immediately to throw when @vocab is detected.

@tplooker
Copy link
Author

I do think making changes to the VC Data Model in this regard would be helpful, but I would formally object to swallowing undefined terms by adding @vocab without a mandatory warning, its like turning off a lint rule that protects against prototype pollution... the only reason to do so, is because the developer is not good at security, and can't figure out how to fix the issue correctly.

+1 to a warning

@OR13
Copy link
Contributor

OR13 commented Nov 17, 2020

perhaps we can split this up into:

  1. @vocab detection MUST/SHOULD yield a warning.
  2. @vocab is included in credentials/v2

I could be supportive of both, but 1 is decidedly less controversial given the disclosed issue above.

@OR13
Copy link
Contributor

OR13 commented Nov 17, 2020

@OR13
Copy link
Contributor

OR13 commented Nov 17, 2020

ping @msporny @dlongley

@David-Chadwick
Copy link
Contributor

types and strictness are part of good security engineering... we should not be failing open... we should be failing closed.

This is not what happens in practise. If you look at today's browsers and interception products with X509 certs, they typically fail open rather than closed, because usability trumps security. We have written about this extensively, see for example

Wazan, Ahmad Samer; Laborde, Romain; Chadwick, David W; Kallel, Sana; Venant, Rémi; Barrere, François; Benzekri, Abdelmalek; Billoir, Eddie. "On the Validation of Web X.509 Certificates by TLS interception products", IEEE Transactions on Dependable and Secure Computing (2020). Print ISSN: 1545-5971, Online ISSN: 1545-5971, Digital Object Identifier: 10.1109/TDSC.2020.3000595

A.S. Wazan, R. Laborde, D.W. Chadwick, F. Barrere, A. Benzekri. "TLS Connection Validation by Web Browsers: Why do Web Browsers still not agree?". 41st IEEE Computers, Software, and Applications Conference (COMPSAC 2017), Politecnico di Torino, Turin, Italy July 4-8, 2017

Ahmad Samer Wazan, Romain Laborde, David W Chadwick, François Barrere, AbdelMalek Benzekri. “Which web browsers process SSL certificates in a standardized way?” 24th IFIP International Security Conference, Cyprus, May 18-20th, 2009

I personally don't like seeing warnings,

And users typically click through warnings, so they are ineffective, hence pointless.

@msporny
Copy link
Member

msporny commented Nov 18, 2020

If instead a default vocabulary was included in a new revision of the credentials context, then the signing process would proceed and the properties without a formal term definition would be expanded with the default vocab.

Big -1 to this :) for at least the following reason:

This is a security vocabulary and reasonable defaults should lead to failing closed. Having a default @vocab goes against that design principle. By that, I mean that if something is not defined, we shouldn't assume what the developer intended because while we may be right 99.99% of the time... the 0.01% of the time that we're wrong will be used as a catastrophic exploit in some critical infrastructure somewhere in the world.

This is why it's a good thing that it's so difficult to get digital signature libraries to validate a digital signature... it's a like a very advanced lock on a door protecting your most treasured secrets... if all the tumblers don't line up just right, the door shouldn't open.

Remember that developers may not include this context in their proof, but at the top level of their object and by doing so, you may obliterate assumptions they may have about @vocab. They may want to use one and you overwrite it, or you may create a default vocab when they definitely don't want one. @vocab was always meant as a hack for lazy development ("I'll just slam a @vocab up here until I have the time to write a solid JSON-LD Context).

So, let's please drop this on the floor as the bad idea that it is -- no default @vocab -- it'll lead to insecure systems.

@msporny
Copy link
Member

msporny commented Nov 18, 2020

apparently folks are already exploiting this behavior... https://github.com/blockchain-certificates/cert-verifier-js/blob/master/src/inspectors/computeLocalHash.js#L67

That's terrifying.

@OR13
Copy link
Contributor

OR13 commented Nov 18, 2020

@msporny might we consider adding a warning or throwing an error if @vocab is detected? As you can see, folks are already digging graves for their users with this feature.

@dlongley
Copy link
Contributor

I think we've got a tooling problem -- not a spec problem here. We need better tools that make things easier for people to check their vocabularies, figure out what to do to fix them, or automate this process for them in a safe way. I don't think we need to (or should) change the core security underlying it all. VCs are improvements over existing systems for a number of reasons, one of which is the fail closed model -- where you have to define what you're talking about/signing vs. there being ambiguity around it -- whilst still allowing for decentralized extensions. The community just needs to make it easier for people to create and test those extensions.

@OR13
Copy link
Contributor

OR13 commented Nov 18, 2020

@dlongley nobody is going to implement tooling support for non-normative requirements... IMO this is a spec issue, and presence of @vocab should result in an error, or at least a warning.... implementers should be normatively required to address this, because not doing so will lead to the kind of hacks noted in this thread:

  1. schema.org being included and @vocab silently swallowing undefined terms (no error, no warning).
  2. @vocab being injected too accomplish the same functionality via expandContext.push({ '@vocab': 'http://fallback.org/' });

Both of these are "valid interpretations of the vc api" imo, they are not good... would prefer to see them be forbidden.

In absence of spec changes, how might we address this?

@tplooker
Copy link
Author

Ok so can we clarify the intended behaviour here, e.g in ANY case of using @vocab for expansion, whether it be the default one (or not as people are saying no) should a JSON-LD processor warn when doing JSON-LD expansion for the purposes of producing a linked data proof OR should it error?

I believe it should at least warn, as it is trivial to include in a documents context today a context that makes use of @vocab, case and point are the numerous parties that use https://schema.org directly and most likely don't even know@vocab is active. So in effect the particular security model you have outlined @msporny, is frequently, sometimes knowingly, sometimes unknowingly being subverted.

On a side note a similar question applies to usage of @base.

Secondly can someone please explain in more detail why failing on un-documented terms vs signing them under an IRI that clearly states, no clear definition is failing open?

Finally as @OR13 pointed out earlier, we (Mattr) originally looked at this as a solution because of 199 and without a solution to it, I would posit that a default vocab in a v2 credentials context is 100 times less of a security risk than silently dropping terms...

@tplooker
Copy link
Author

If you want to address this in a secure way, you should consider doing something like X.509 with its critical extension feature. With this feature, every attribute at the outer level is fully understood, even if an inner value is not. But the outer level says whether you can safely ignore inner values or not.

@David-Chadwick would you not still be able to detect which terms are fully understood with the expanded IRI that was used e.g if (term.startsWith("https://www.w3.org/2018/credentials/undefinedTerm#")) its unknown

@tplooker
Copy link
Author

tplooker commented Nov 19, 2020

Or better still if (term.type == https://www.w3.org/2018/credentials/undefinedTerm) its unknown

@David-Chadwick
Copy link
Contributor

In X.509 you dont say whether the term is undefined or not, rather the recipient decides if it understands it or not. But each extension (term) is marked critical or not. Critical means you HAVE to understand it or reject the certificate. Not critical means you can safely ignore this term if you dont understand it. So mapping this to undefined schema terms, then each one would need to be marked critical or not. A recipient that understands the undefined term that is marked critical can process the VC, all other recipients would have to reject it. If the undefined term is marked non-critical than all recipients can safely ignore it if they dont understand it. Those that do will still process it. All defined schema terms are implicitly critical. This is the current implied meaning of the W3C recommendation.

@tplooker
Copy link
Author

All defined schema terms are implicitly critical. This is the current implied meaning of the W3C recommendation.

So in this model you are saying that all properties in a VC MUST always be understood by the verifier, otherwise it should reject it, meaning there is no notion of optionally processed information in a VC?

@tplooker
Copy link
Author

tplooker commented Nov 25, 2020

In X.509 you dont say whether the term is undefined or not, rather the recipient decides if it understands it or not.

Agree, because on the surface just because a term has a formal definition it does not directly imply that every consumer understands that definition, unless you impose the constraint you spoke of above that all terms MUST have definitions and all verifiers MUST always understand ALL definitions for the terms featured in the data they are consuming.

@OR13
Copy link
Contributor

OR13 commented Nov 30, 2020

This issue is a bit of a tire fire... can we split this up into a few separate issues?

IMO they should be:

  1. Add warning about @vocab to spec.
  2. Add mandatory error for processing @vocab to spec
  3. Add ability to override error by defaulting @vocab to something that resolves to human readable documentation.

We don't need to agree to all 3, but we should be able to easily agree to 1.

@tplooker
Copy link
Author

tplooker commented Jan 27, 2022

You can't guess what the issuer's intent was because no semantics were assigned.

Yes but that continues to be the point, making this a blocking concern up front just means people either

  1. give up trying to define semantics and therefore using the JSON-LD VC format
  2. do it badly
  3. OR (and in my experience its a smaller minority) get it right

Pushing for all of the education around JSON-LD to be critical path to just getting started with VC's will damage its adoption.

When someone doesn't declare a variable in a programming language, and then they try to do something with the variable, the compiler (rightly) throws an error

This is the wrong analogy, undefined term is not equivalent to an undefined variable its equivalent to an un-typed variable IMO. Which means @OR13's analogy around JS and TS are the right framing.

@dlongley
Copy link
Contributor

While I do find some of the arguments around usability compelling, I remain concerned around subtle vulnerabilities or threats that haven't been adequately discussed. For example, suppose:

Credential Type A uses the undefined term "foo" (which falls back to a globally unambiguous value using @vocab).

Credential Type B uses the undefined term "foo" (which falls back to a globally unambiguous value using @vocab).

Different issuers of A and B do not mean the same "foo" -- and perhaps the difference is stark or perhaps it is subtle. What happens when these things are confused in the marketplace for one another? How bad is the damage? Who is liable? How does this possibility affect the previous guarantees or promises made to the users of VCs? How does that impact uptake of VCs in the marketplace?

Using a default @vocab means converting a currently mandatory mechanism for eliminating ambiguity between two terms that look the same (but are not) to an optional one. We know that eliminating this kind of ambiguity is clearly better (and that security systems should fail closed), however, we have also experienced that it's currently more difficult for people to accomplish this than we would like. There may be any number of reasons for this -- and addressing any one of these causes may be harder/easier than others. The argument being made here, I believe, is that failing open in this case is a requirement to see an appropriate level of uptake (or an appropriate reduction in damage to adoption), i.e., without this, VCs will fail in the marketplace.

If the advocates for a default @vocab are right and VCs would fail in the marketplace without it, then it would clearly be better to add one -- provided we believe VCs are still useful with that change. However, but it shouldn't be done without better discussion and understanding of the above threat(s). Maybe we'll discover we aren't much worse off with a default @vocab -- or we'll discover that we can take another approach or tweak this one. But if we don't analyze the threats and talk about how they can be mitigated or how they aren't really that threatening, we'll likely just continue to go back and forth here. And ... that's because both parties, I believe, are acting in good faith and are both "right" about important goals:

  1. Having VCs be successful in the marketplace.
  2. Ensuring VCs actually improve the status quo.

We should also consider a way to reinstate globally unambiguous definitions as mandatory later on down the line -- once VCs are well adopted. Perhaps this could be as simple as having people upgrade to a new context and having verifiers prefer VCs that use the new context (that lacks the default @vocab). If the market really cares about this, that kind of transition should be possible, right?

@dlongley
Copy link
Contributor

@VladimirAlexiev,

Is it possible to add to the JSONLD spec that JSONLD processors should have no default @base, and should throw an error if a relative URL is used while no @base is set?

Btw, work is being done in the jsonld.js JavaScript JSON-LD processor to provide an option to throw errors in all cases where terms are not fully defined. DB's plan is to make all of our Data Integrity Proof (formerly LD Proofs) code always run with that option turned on. This isn't a spec level change (to JSON-LD), but it could be in the future.

@dlongley
Copy link
Contributor

VCs are operating in a space where we both want to enable security and authenticity -- but also open world flexibility and decentralization. The reason VCs had to be invented, IMO, is because other technologies that are close to the space have always fallen into one end of the spectrum or the other and neither is suitable for use in the main area of use cases that lie between. We're trying to distribute trust without having to distribute infrastructure.

We're likely going to need to adopt some blend here as well to get the adoption we need without losing the most important security benefits. We should also consider that the flexible technology choices we've already made allow for us to adapt with time -- as the needs of today may not be the needs of tomorrow.

@tplooker
Copy link
Author

While I do find some of the arguments around usability compelling, I remain concerned around subtle vulnerabilities or threats that haven't been adequately discussed. For example, suppose:

Credential Type A uses the undefined term "foo" (which falls back to a globally unambiguous value using @vocab).

Credential Type B uses the undefined term "foo" (which falls back to a globally unambiguous value using @vocab).

Different issuers of A and B do not mean the same "foo" -- and perhaps the difference is stark or perhaps it is subtle. What happens when these things are confused in the marketplace for one another? How bad is the damage? Who is liable? How does this possibility affect the previous guarantees or promises made to the users of VCs? How does that impact uptake of VCs in the marketplace?

IMO the semantics of undefined term as a fall back via @vocab should discourage equivalence being drawn between the meaning of the term in credential type A vs credential type B, some of this could easily be facilitated via the technology through awareness of this "special" IRI.

Another point I would make is that because @vocab is a thing in JSON-LD, it is already being used un-wittingly by some so the issue you point out @dlongley is occurring but in a much worse way. For example if someone references schema.org in their credential today and does not explicitly define terms the default vocab is https://schema.org e.g a resulting IRI that means terms are distinguishable from those actually explicitly defined in the context

@tplooker
Copy link
Author

To be clear I'm in favour of a proposal that both incorporates both a default vocab in the VC context and normative guidance around implementations throwing warnings when @vocab is being used to expand a term during signing.

@tplooker
Copy link
Author

Implementations could then elect to have stricter processing rules like throw an error instead of just a warning.

@mprorock
Copy link
Contributor

To be clear I'm in favour of a proposal that both incorporates both a default vocab in the VC context and normative guidance around implementations throwing warnings when @vocab is being used to expand a term during signing.

Implementations could then elect to have stricter processing rules like throw an error instead of just a warning.

I am in agreement with this approach and think it is extremely sane

@TallTed
Copy link
Member

TallTed commented Jan 27, 2022

Actually it's better to use

"@vocab":"https://www.w3.org#undefinedTerm#"

I suggest instead using --

"@vocab":"https://example.org#undefinedTerm#"

-- because an accidental DDoS on www.w3.org would be bad.

This doesn't solve the problem of Credential Type A and Credential Type B both using the same unqualified term "uqt" in their JSON, and thus both falling back to the same qualified term "https://example.org#undefinedTerm#uqt", although they intended completely different meanings in A than in B ... but it does at least include the suggestion of a problem, as nothing based in example.org should be found in production.

@peacekeeper
Copy link
Contributor

peacekeeper commented Jan 31, 2022

I am a server administrator and I want to set up a firewall, so that my server can only be used for the services I intend to be used. So I configure the exact TCP/IP rules for the various services. Then someone complains to me that they can't reach something on my server. So, to fix the problem, I set up a 0.0.0.0/0 allow-all firewall rule on top of my existing rules, to improve usability.

That's how @vocab feels to me :)

@msporny
Copy link
Member

msporny commented Jan 31, 2022

@peacekeeper, you're not wrong -- that analogy made me chuckle. :)

Since everything in JSON-LD maps to a URL, it's not as terrible as just letting everything through (as in the firewall example above). You can set up a rule to redirect "everything else" to a particular namespace. So, you can contain the "damage" to a degree.

We could add a "developer mode" context that does have a @vocab, something like: https://www.w3.org/ns/2022/credentials/debug/v1 that contains a @vocab definition to the undefined term stuff people are talking about above.

Another possibility is to just use the term as a blank node, via something like "@vocab": "_:", which is always local to the document. So, an undefined term like foo would map to the blank node _:foo, which is always local to the document. That's the more semantically correct thing to do... but I'm probably forgetting some reason why that's a terrible idea.

That schema.org mapped everything to their namespace is probably not a pattern we want to follow. The whole goal of schema.org is to provide ONE namespace for the entire Web to define semantics in (at least, the semantics that search engines care about). So, it would make sense that they would want the option to "own" any term that is used on the Web. The VCWG does not have the same aspirations.

The thing that I'm still not sold on is this notion that not having @vocab is a major factor in why people are having a hard time with Verifiable Credentials. Since the VC industry is only a few years old, we continue to suffer from a "crappy tooling" standpoint. The focus should probably be there, instead of trying to find hacks w/ JSON-LD that will "just make the errors go away".

At the same point, we don't want to become the XHTML of the Web, raising an error even on correctable input... we all know how that ended -- HTML5 won because it was more lax and easier to work with for a class of authors that just didn't care about well formed documents. JSON-LD was intended to allow lazy development up until the point of production, and that's one of the reasons why @vocab exists. It was meant to be this thing that people used in development until they felt they were ready to lock down the semantics. It was never meant to be this thing that people used in production, and some of us looked on in horror as schema.org shipped @vocab to production.

In any case, there are enough options here where I feel like we'll eventually get to consensus... and in the meantime, people seem to be deploying VCs to production (even if it's not entirely a great experience at present). I'd put my money on better tooling helping that along more than @vocab.

@TallTed
Copy link
Member

TallTed commented Jan 31, 2022

Another possibility is to just use the term as a blank node, via something like "@vocab": "_:", which is always local to the document. So, an undefined term like foo would map to the blank node _:foo, which is always local to the document. That's the more semantically correct thing to do... but I'm probably forgetting some reason why that's a terrible idea.

I'm forgetting the same reason(s). This might actually be the best immediate path forward, all things considered.

@OR13
Copy link
Contributor

OR13 commented Jan 31, 2022

The thing that I'm still not sold on is this notion that not having @vocab is a major factor in why people are having a hard time with Verifiable Credentials.

Its a major problem with doing them in JSON-LD.... its not a problem if you are just using JOSE to sign JSON.

I think JOSE will be increasingly the preferred solution if we don't address this concern correctly.

@selfissued
Copy link
Contributor

I agree with this statement:

I think we should be widening VC Data Model so that the "legal but not great category" is larger... and we should avoid narrowing the VC Data Model so that only JSON-LD experts can build valid JSON-LD credentials.... If we want large adoption we have to make the technology easier to use, and that means not putting the desires of experts over the experience of the average user.

I appreciate the thrust of @tplooker 's intent with this issue.

@dlongley
Copy link
Contributor

dlongley commented Feb 2, 2022

@msporny,

Another possibility is to just use the term as a blank node, via something like "@vocab": "_:", which is always local to the document. ... That's the more semantically correct thing to do... but I'm probably forgetting some reason why that's a terrible idea.

The problem with that approach is that it doesn't avoid the errors that people want to avoid.

These errors are raised because terms have not been mapped to globally unambiguous identifiers. If you map terms to blank nodes, then the same error will end up being thrown because the term still isn't globally unambiguous (it won't translate to a valid predicate to be signed or verified). So, for code that properly checks that terms are unambiguously defined and raises errors when they are not, the outcome will be the same.

So, it seems that, if @vocab is to be used to solve the problem, then the right approach would be to use a base URL like <authority>/ambiguous-vocab# or <authority>/undefined-terms#. In fact, we could require @vocab and the base URL to be set to this -- and then indicate that production verifiers may want a strict mode that throws if they encounter any terms using it.

@tplooker
Copy link
Author

tplooker commented Feb 3, 2022

These errors are raised because terms have not been mapped to globally unambiguous identifiers. If you map terms to blank nodes, then the same error will end up being thrown because the term still isn't globally unambiguous (it won't translate to a valid predicate to be signed or verified). So, for code that properly checks that terms are unambiguously defined and raises errors when they are not, the outcome will be the same.

To elaborate on this a little more I think the reason why we cannot use blank nodes for expansion here is because the canonical form that gets signed when using JSON-LD signatures is a set of N-Quads (a serialization of RDF) which does not support relative IRI's. Correct me if I am wrong @dlongley?

So, it seems that, if @vocab is to be used to solve the problem, then the right approach would be to use a base URL like /ambiguous-vocab# or /undefined-terms#. In fact, we could require @vocab and the base URL to be set to this -- and then indicate that production verifiers may want a strict mode that throws if they encounter any terms using it.

@dlongley is your concrete proposal for the value to be <authority>/ambiguous-vocab# or something else? I would recommend that we provide a default url that can be recognised by implementations rather than encouraging everyone to define their own default vocab as that in it self would create another useability problem

@dlongley
Copy link
Contributor

dlongley commented Feb 4, 2022

To elaborate on this a little more I think the reason why we cannot use blank nodes for expansion here is because the canonical form that gets signed when using JSON-LD signatures is a set of N-Quads (a serialization of RDF) which does not support relative IRI's.

That's right (nor does it support blank node identifiers in the predicate position).

@dlongley is your concrete proposal for the value to be /ambiguous-vocab# or something else? I would recommend that we provide a default url that can be recognised by implementations rather than encouraging everyone to define their own default vocab as that in it self would create another useability problem

I definitely didn't mean to imply that everyone would define their own URL. I was just trying to avoid a bikeshedding exercise in my comment. To be clear: there should be exactly one URL for this and it goes into the VC 2.0 @context. People can bikeshed exactly what that one URL should be.

@mprorock
Copy link
Contributor

mprorock commented Feb 4, 2022

To be clear: there should be exactly one URL for this and it goes into the VC 2.0 @context. People can bikeshed exactly what that one URL should be.

+1

@mavarley
Copy link

mavarley commented Feb 8, 2022

A clarification question; the default @vocab would map all values to undefined/unspecified/ambiguous? Even popular terms like "name" and "birthdate", for example?

If that is the case I believe it meets the goal of providing an easy entry for new developers to get started with Verifiable Credentials and JSON-LD, while ensuring systems are capable of detecting undefined terms.

The default @vocab also provides developers a 'best practices' approach of working with undefined terms, as opposed to invoking a myriad of other 'tricks' that developers can use, like "example.com" or other methods where they may not fully grasp the consequences.

@TallTed

This comment was marked as resolved.

@iherman
Copy link
Member

iherman commented Aug 4, 2022

The issue was discussed in a meeting on 2022-08-03

  • no resolutions were taken
View the transcript

6.6. Default vocab for credentials context v2 (issue vc-data-model#753)

See github issue vc-data-model#753.

Michael Jones: 843 is one of a set of issues we're going to have about support specifically for JSON-LD in the core data model..
… I don't disagree with where the issue landed..
… There's fundamental factors and discussions to have about if/when the core data model requires JSON-LD and when it doesn't..
… This is one instance of that larger discussion. But don't want to discuss it now..

Brent Zundel: Thank you Mike, I agree that is a conversation the working group will be having - not today..

Manu Sporny: yep, vc-data-model.

Brent Zundel: 753... Keeping in the core data model?.
… Label off..

@msporny
Copy link
Member

msporny commented Sep 21, 2022

A recent discussion on this has been going on Twitter:

https://twitter.com/OR13b/status/1571223765106802689

Recording here for posterity:

Orie Steele @OR13b @Gkellogg @philarcher1 uses @danbri's idea to disable remote context loading via vocab... @selfissued

@kristinayasuda ... Some ideas from vc-jose draft in here to... + RFC9278 : )

Does it matter who signs content that is injected in web pages?

Gregg Kellogg @gkellogg · Sep 17 Replying to @OR13b
@vocab could lead to use of undefined vocabulary terms and possibly wrong string interpretations. Validating via SHACL would guard against this and could avoid loading contests.

Dan Brickley @danbri · Sep 17
Can you spell out what’s so wrong about undefined vocabulary terms?

In RDF, RDFS (& OWL) are optional. Obviously catching errors & falsehoods is important - but use of @-context comes at huge complexity cost too. Wrong terms = semantically boring, but mostly harmless.

Gregg Kellogg @Gkellogg · Sep 17
It’s not necessarily a bad thing. But, when using a defined vocabulary, such as say FOAF, it can be an input error. “Hi, my foaf:mane is not Gregg”. Not something that maybe should be in a VC. But for informal vocabularies, maybe not a problem.

Orie Steele @OR13b · Sep 17
Issuers signing malformed data is not solved by adding additional semantics... Getting people to produce more semantic data is the key to getting people to care more about the quality of semantic data... Vocab adds basic usability, but doesn't block experts from going wild.

Dan Brickley @danbri · Sep 18
Allowing undefined terms at basic level of specs enables quick & easy testing of vocabulary improvements, locally useful extensions etc., which might at some point get included. As a maintainer of http://Schema.org + FOAF I selfishly appreciate how this reduces pressures!

Orie Steele @OR13b · Sep 18
I suspect @selfissued would agree, this also matches the basic assumptions around JSON extensibility... That you can just add properties when you need them.

@msporny
Copy link
Member

msporny commented Oct 22, 2022

This is now a duplicate of #953, closing in favor of the newer issue which is where it seems the discussion has moved to.

@msporny msporny closed this as completed Oct 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests