Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The application of blank node identifiers #10

Closed
tplooker opened this issue Apr 18, 2020 · 16 comments
Closed

The application of blank node identifiers #10

tplooker opened this issue Apr 18, 2020 · 16 comments
Assignees
Labels
pending-close The issue will be closed in 7 days if there are no objections.

Comments

@tplooker
Copy link
Contributor

tplooker commented Apr 18, 2020

In RDF canonicalisation/normalisation, one of the trickiest parts of the algorithm is dealing with blank nodes. RDF Data Set Normalisation defines a way in which to allocate identifiers for blank nodes deterministically for normalization.

However the algorithm does not guarantee that the same blank node identifiers will be allocated in the event of modifications to the graph.

For example

The following JSON-LD normalized

{
    "@context": [ "http://schema.org/",
                  "https://w3id.org/security/v2" ],
    "firstName": "Jane",
    "lastName": "Does",
    "jobTitle": "Professor",
    "telephone": "(425) 123-4567",
    "email": "jane.doe@example.com",
    "address": {
      "postalAddress": "test"
     }
}

Yields

_:c14n0 <http://schema.org/postalAddress> "test" .
_:c14n1 <http://schema.org/address> _:c14n0 .
_:c14n1 <http://schema.org/email> "jane.doe@example.com" .
_:c14n1 <http://schema.org/firstName> "Jane" .
_:c14n1 <http://schema.org/jobTitle> "Professor" .
_:c14n1 <http://schema.org/lastName> "Does" .
_:c14n1 <http://schema.org/telephone> "(425) 123-4567" .

Where we have two blank nodes _:c14n0 and _:c14n1.

If we then remove the address from the original JSON-LD i.e like below

{
    "@context": [ "http://schema.org/",
                  "https://w3id.org/security/v2" ],
    "firstName": "Jane",
    "lastName": "Does",
    "jobTitle": "Professor",
    "telephone": "(425) 123-4567",
    "email": "jane.doe@example.com"
}

We get different blank node identifiers for the same statements shown above.

_:c14n0 <http://schema.org/email> "jane.doe@example.com" .
_:c14n0 <http://schema.org/firstName> "Jane" .
_:c14n0 <http://schema.org/jobTitle> "Professor" .
_:c14n0 <http://schema.org/lastName> "Does" .
_:c14n0 <http://schema.org/telephone> "(425) 123-4567" .

Why is this a problem?

Because bbs-signatures aim to create selective disclosure of statements i.e revealing only a portion of the originally signed data graph and we must prove the integrity of revealed statements to show they were originally signed by the issuer.

Because we are often revealing only a subset of statements from the original, we must have a way in which guarantees that the node identifiers of the normalized statements being revealed match those that were originally signed.

Solution 1

When producing a BBS Signature we use the blank node identifier algorithm to allocate blank node identifiers, which we then transform into proper node identifiers, (E.g :c14n1 => urn:bnid::c14n1) for which we then sign the resulting statements, see below for an example.

Given the input JSON-LD document to signed

{
    "@context": [ "http://schema.org/",
                  "https://w3id.org/security/v2" ],
    "firstName": "Jane",
    "lastName": "Does",
    "jobTitle": "Professor",
    "telephone": "(425) 123-4567",
    "email": "jane.doe@example.com",
    "address": {
      "postalAddress": "test"
    }
}

We normalize

_:c14n0 <http://schema.org/postalAddress> "test" .
_:c14n1 <http://schema.org/address> _:c14n0 .
_:c14n1 <http://schema.org/email> "jane.doe@example.com" .
_:c14n1 <http://schema.org/firstName> "Jane" .
_:c14n1 <http://schema.org/jobTitle> "Professor" .
_:c14n1 <http://schema.org/lastName> "Does" .
_:c14n1 <http://schema.org/telephone> "(425) 123-4567" .

Then transform any node identifiers to proper ones to get our input into signing

<urn:bnid:_:c14n0> <http://schema.org/postalAddress> "test" .
<urn:bnid:_:c14n1> <http://schema.org/address> <urn:bnid:_:c14n0> .
<urn:bnid:_:c14n1> <http://schema.org/email> "jane.doe@example.com" .
<urn:bnid:_:c14n1> <http://schema.org/firstName> "Jane" .
<urn:bnid:_:c14n1> <http://schema.org/jobTitle> "Professor" .
<urn:bnid:_:c14n1> <http://schema.org/lastName> "Does" .
<urn:bnid:_:c14n1> <http://schema.org/telephone> "(425) 123-4567" .

We then take the normalized from and cast it back to JSON-LD with the allocated blank node identifiers

{
    "@context": [ "http://schema.org/",
                  "https://w3id.org/security/v2" ],
    "id": "urn:bnid:_:c14n1",
    "firstName": "Jane",
    "lastName": "Does",
    "jobTitle": "Professor",
    "telephone": "(425) 123-4567",
    "email": "jane.doe@example.com",
    "address": {
      "id": "urn:bnid:_:c14n0",
      "postalAddress": "test"
    },
    { 
    "type": "BbsBlsSignature2020",
    "created": "2020-04-18T05:26:47Z",
    "revealedStatements": [ 1, 3 ],
    "verificationMethod": "did:example:489398593#test",
    "proofPurpose": "assertionMethod",
    "signature":  "BAXSgt7mVHjjpH6H......" 
  } 
}

This means when we are creating a sub-graph we have the allocated identifiers for the formerly blank nodes allocated.

E.g given the above digital signature, the following proof could be derived.

{
    "@context": [ "http://schema.org/",
                  "https://w3id.org/security/v2" ],
    "id": "urn:bnid:_:c14n1",
    "firstName": "Jane",
    "lastName": "Does",
    { 
    "type": "BbsBlsSignatureProof2020",
    "verificationMethod": "did:example:489398593#test",
    "proofPurpose": "assertionMethod",
    "proof":  "BAXSgt7mVHjjpH6H......" ,
    "requiredRevealStatements": [ 1, 3 ],
    "revealedStatements": [ 1, 3 ],
    "totalStatements": 10,
  } 
}
@tplooker
Copy link
Contributor Author

tplooker commented Apr 19, 2020

The downside to the above solution is it effectively mandates the allocating of identifiers to blank nodes, i.e the signature suite would add node identifiers to all blank nodes to the data graph being signed. Which could be problematic for other suites signing the graph that do not have this requirement. It also un-intuitively adds a property to each blank node in the graph which could confuse some developers as to why it is present.

@kdenhartog
Copy link
Member

kdenhartog commented Apr 19, 2020

An alternative is to provide a normative requirement that an id is supplied for each block. I think I prefer the method you proposed because it handles that for the caller which seems to be a better separation of concerns (returning an error because it can’t canonicalize seems odd). I'll have to think about it a bit more.

@tplooker
Copy link
Contributor Author

The other option is to compute the blank node identifiers when deriving a proof.

Deriving a proof takes in two inputs, the input proof document and the reveal document. Normalizing the input proof document will give us the original set of statements that were signed, including the correct blank node identifiers as the graph is identical to what was signed.

We could elect to on the proof derivation, transform the blank node identifiers to proper node identifiers and return these in the derived proof, then on proof verification, the processing software would need to substitute the transformed blank node identifier (urn:bnid::c14n1) back into a normal blank node identifier (:c14n1) during normalization.

As an example below.

Given the input JSON-LD document to signed

{
    "@context": [ "http://schema.org/",
                  "https://w3id.org/security/v2" ],
    "firstName": "Jane",
    "lastName": "Does",
    "jobTitle": "Professor",
    "telephone": "(425) 123-4567",
    "email": "jane.doe@example.com",
    "address": {
      "postalAddress": "test"
    }
}

Yields

_:c14n0 <http://schema.org/postalAddress> "test" .
_:c14n1 <http://schema.org/address> _:c14n0 .
_:c14n1 <http://schema.org/email> "jane.doe@example.com" .
_:c14n1 <http://schema.org/firstName> "Jane" .
_:c14n1 <http://schema.org/jobTitle> "Professor" .
_:c14n1 <http://schema.org/lastName> "Does" .
_:c14n1 <http://schema.org/telephone> "(425) 123-4567" .

For which we sign the normalized statement as is and return the following

{
    "@context": [ "http://schema.org/",
                  "https://w3id.org/security/v2" ],
    "firstName": "Jane",
    "lastName": "Does",
    "jobTitle": "Professor",
    "telephone": "(425) 123-4567",
    "email": "jane.doe@example.com",
    "address": {
      "postalAddress": "test"
    },
    { 
    "type": "BbsBlsSignature2020",
    "created": "2020-04-18T05:26:47Z",
    "revealedStatements": [ 1, 3 ],
    "verificationMethod": "did:example:489398593#test",
    "proofPurpose": "assertionMethod",
    "signature":  "BAXSgt7mVHjjpH6H......" 
  } 
}

Then when we want to derive a proof, say with the reveal document of

{
    "@context": [ "http://schema.org/",
                  "https://w3id.org/security/v2" ],
    "firstName": "Jane",
    "lastName": "Does"
}

We would normalize the input proof to re-obtain the original normalized form

_:c14n0 <http://schema.org/postalAddress> "test" .
_:c14n1 <http://schema.org/address> _:c14n0 .
_:c14n1 <http://schema.org/email> "jane.doe@example.com" .
_:c14n1 <http://schema.org/firstName> "Jane" .
_:c14n1 <http://schema.org/jobTitle> "Professor" .
_:c14n1 <http://schema.org/lastName> "Does" .
_:c14n1 <http://schema.org/telephone> "(425) 123-4567" .

Diff the obtained statements with the statements we would like to reveal

_:c14n1 <http://schema.org/firstName> "Jane" .
_:c14n1 <http://schema.org/lastName> "Does" .

Transform the blank node identifiers

<urn:bnid:_:c14n1> <http://schema.org/firstName> "Jane" .
<urn:bnid:_:c14n1> <http://schema.org/lastName> "Does" .

Derive the proof and convert back to JSONLD giving the same output as before

{
    "@context": [ "http://schema.org/",
                  "https://w3id.org/security/v2" ],
    "id": "urn:bnid:_:c14n1",
    "firstName": "Jane",
    "lastName": "Does",
    { 
    "type": "BbsBlsSignatureProof2020",
    "verificationMethod": "did:example:489398593#test",
    "proofPurpose": "assertionMethod",
    "proof":  "BAXSgt7mVHjjpH6H......" ,
    "requiredRevealStatements": [ 1, 3 ],
    "revealedStatements": [ 1, 3 ],
    "totalStatements": 10,
  } 
}

But when we verify and have normalized the revealed statements again, e.g

<urn:bnid:_:c14n1> <http://schema.org/firstName> "Jane" .
<urn:bnid:_:c14n1> <http://schema.org/lastName> "Does" .

We would need to process the relevant statements node identifiers back into normal blank node identifiers before verifying the proof e.g

_:c14n1 <http://schema.org/firstName> "Jane" .
_:c14n1 <http://schema.org/lastName> "Does" .

@tplooker
Copy link
Contributor Author

Advantages and disadvantages of the two options

Option 1 (Described in the original issue text)

Advantages
This approach is just essentially disallowing blank node identifiers to be present on graphs that are signed using BBSSignature2020 and using the blank node identifier assigning algorithm for any blank nodes. There is no custom processing on the normalized form down stream i.e in deriving or validating a proof.

Disadvantages
The BBSSignature2020 will enforce the assignment of blank node identifiers which could be problematic if the data graph was originally signed. It also might create developer confusion as to why this special "id" field needs to be present

Option 2

Advantages
No requirement for id in graphs signed with BBSSignature2020, graphs signed with other proof types would still be compatible

Disadvantages
Requires custom processing of the normalized form at the derive and verify proof steps.

@OR13
Copy link
Contributor

OR13 commented Apr 19, 2020

I guess option 2 is better, moves the complexity from external to internal, always better to shoulder the burden than expose a quirk to library consumers...

@tplooker
Copy link
Contributor Author

tplooker commented Apr 19, 2020

Ok option 2 it is for the time being!

@kdenhartog
Copy link
Member

probably worth checking that some of the details needed for generating the blank nodes aren't lost/modified when doing proof generation based on the original VC. If this holds such that the verifier doesn't get bad signatures when verifying then I think option 2 is the better route as well.

@tplooker
Copy link
Contributor Author

tplooker commented May 19, 2020

An alternative proposal suggested by @dlongley is to have a mapping for any BN identifiers in the proof to prevent the need for special skolem URIs (e.g urn:bnid:_:c14n1).

This mapping must map the blank node identifiers allocated to the statements in the canonical form of the derivedProof back to what their blank node identifiers were in the originally signed proof.

Because all blank nodes have a common prefix e.g _:c14n* we essentially only need a map between the numbers assigned to the blank nodes for example mapping from _:c14n1 => _:c14n0 could be shorthanded to 1 => 0.

How we express this in the proof we could do in one of two ways.

  1. An ordered array of signed integers, where the index in the array represents the blank node id in the derived proof and the value is the blank node id in the original proof, this would result in something like the following.
{
    .....data graph for which the proof applies to
    { 
    "type": "BbsBlsSignatureProof2020",
    "verificationMethod": "did:example:489398593#test",
    "proofPurpose": "assertionMethod",
    "proof":  "BAXSgt7mVHjjpH6H......" ,
    "requiredRevealStatements": [ 1, 3 ],
    "blankNodeIdentiferMap": [
         1,
         0,
         2,
         3
    ]
  } 
}
  1. A map, where the key represents the blank node id in the derived proof and the value is the blank node id in the original proof, this would result in something like the following.
{
    .....data graph for which the proof applies to
    { 
    "type": "BbsBlsSignatureProof2020",
    "verificationMethod": "did:example:489398593#test",
    "proofPurpose": "assertionMethod",
    "proof":  "BAXSgt7mVHjjpH6H......" ,
    "requiredRevealStatements": [ 1, 3 ],
    "blankNodeIdentiferMap": {
         "0": 1,
         "1": 0,
         "2": 2,
         "3": 3
    }
  } 
}

@dlongley is this an accurate summary of the idea you had in mind?

@dlongley
Copy link
Contributor

@tplooker,

@dlongley, is this an accurate summary of the idea you had in mind?

Yes. I also think that the mapping could be embedded in the proofValue -- allowing for some more compact representation, i.e., it doesn't have to be broken out at the JSON level.

@tplooker
Copy link
Contributor Author

@dlongley Ah interesting that would further simplify the representation, we will review that as an idea

@tplooker
Copy link
Contributor Author

tplooker commented May 19, 2020

@dlongley did you have any thoughts on how this binary encoding should be done? I.e a custom binary format or should we be using a more formalised method like protocol buffers or message pack?

The simplest custom way I see doing this would be something like the following

proofValue = base64(int16(blankNodeIdentiferMap.length)|int16(blankNodeIdentiferMap[0])..|..int16(blankNodeIdentiferMap[blankNodeIdentiferMap.length])|remainingProofValueOutputedByDeriveProofAPI)

Where | is indicating concatenation blankNodeIdentiferMap is the unsigned int array described in the example above.

@dlongley
Copy link
Contributor

dlongley commented May 19, 2020

My vote for binary representations is CBOR. There is likely to be significant work in the future on CBOR-LD (including representing VCs using that format) and it would be best not to introduce yet another competing format with protocol buffers/etc. So whatever you guys come up with that makes sense in CBOR I would be +1 for.

@kdenhartog
Copy link
Member

@TimoGlastra said during IIW that he was going to take this issue.

@TimoGlastra
Copy link
Contributor

@kdenhartog RE your note in #37. I just discovered you can only assign outside collaborators when they've commented. So here's my comment

@OR13
Copy link
Contributor

OR13 commented May 1, 2023

Can we fold this into #60? I am going to mark it bending close.

@OR13 OR13 added the pending-close The issue will be closed in 7 days if there are no objections. label May 1, 2023
@Wind4Greg
Copy link
Collaborator

Please see the updated document for mechanism details that address these concerns. See PR #101 for privacy considerations related to both data leakage and unlinkability, including analysis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending-close The issue will be closed in 7 days if there are no objections.
Projects
None yet
Development

No branches or pull requests

6 participants