The application of blank node identifiers #10

tplooker · 2020-04-18T21:03:27Z

In RDF canonicalisation/normalisation, one of the trickiest parts of the algorithm is dealing with blank nodes. RDF Data Set Normalisation defines a way in which to allocate identifiers for blank nodes deterministically for normalization.

However the algorithm does not guarantee that the same blank node identifiers will be allocated in the event of modifications to the graph.

For example

The following JSON-LD normalized

{
    "@context": [ "http://schema.org/",
                  "https://w3id.org/security/v2" ],
    "firstName": "Jane",
    "lastName": "Does",
    "jobTitle": "Professor",
    "telephone": "(425) 123-4567",
    "email": "jane.doe@example.com",
    "address": {
      "postalAddress": "test"
     }
}

Yields

_:c14n0 <http://schema.org/postalAddress> "test" .
_:c14n1 <http://schema.org/address> _:c14n0 .
_:c14n1 <http://schema.org/email> "jane.doe@example.com" .
_:c14n1 <http://schema.org/firstName> "Jane" .
_:c14n1 <http://schema.org/jobTitle> "Professor" .
_:c14n1 <http://schema.org/lastName> "Does" .
_:c14n1 <http://schema.org/telephone> "(425) 123-4567" .

Where we have two blank nodes _:c14n0 and _:c14n1.

If we then remove the address from the original JSON-LD i.e like below

{
    "@context": [ "http://schema.org/",
                  "https://w3id.org/security/v2" ],
    "firstName": "Jane",
    "lastName": "Does",
    "jobTitle": "Professor",
    "telephone": "(425) 123-4567",
    "email": "jane.doe@example.com"
}

We get different blank node identifiers for the same statements shown above.

_:c14n0 <http://schema.org/email> "jane.doe@example.com" .
_:c14n0 <http://schema.org/firstName> "Jane" .
_:c14n0 <http://schema.org/jobTitle> "Professor" .
_:c14n0 <http://schema.org/lastName> "Does" .
_:c14n0 <http://schema.org/telephone> "(425) 123-4567" .

Why is this a problem?

Because bbs-signatures aim to create selective disclosure of statements i.e revealing only a portion of the originally signed data graph and we must prove the integrity of revealed statements to show they were originally signed by the issuer.

Because we are often revealing only a subset of statements from the original, we must have a way in which guarantees that the node identifiers of the normalized statements being revealed match those that were originally signed.

Solution 1

When producing a BBS Signature we use the blank node identifier algorithm to allocate blank node identifiers, which we then transform into proper node identifiers, (E.g :c14n1 => urn:bnid::c14n1) for which we then sign the resulting statements, see below for an example.

Given the input JSON-LD document to signed

{
    "@context": [ "http://schema.org/",
                  "https://w3id.org/security/v2" ],
    "firstName": "Jane",
    "lastName": "Does",
    "jobTitle": "Professor",
    "telephone": "(425) 123-4567",
    "email": "jane.doe@example.com",
    "address": {
      "postalAddress": "test"
    }
}

We normalize

_:c14n0 <http://schema.org/postalAddress> "test" .
_:c14n1 <http://schema.org/address> _:c14n0 .
_:c14n1 <http://schema.org/email> "jane.doe@example.com" .
_:c14n1 <http://schema.org/firstName> "Jane" .
_:c14n1 <http://schema.org/jobTitle> "Professor" .
_:c14n1 <http://schema.org/lastName> "Does" .
_:c14n1 <http://schema.org/telephone> "(425) 123-4567" .

Then transform any node identifiers to proper ones to get our input into signing

<urn:bnid:_:c14n0> <http://schema.org/postalAddress> "test" .
<urn:bnid:_:c14n1> <http://schema.org/address> <urn:bnid:_:c14n0> .
<urn:bnid:_:c14n1> <http://schema.org/email> "jane.doe@example.com" .
<urn:bnid:_:c14n1> <http://schema.org/firstName> "Jane" .
<urn:bnid:_:c14n1> <http://schema.org/jobTitle> "Professor" .
<urn:bnid:_:c14n1> <http://schema.org/lastName> "Does" .
<urn:bnid:_:c14n1> <http://schema.org/telephone> "(425) 123-4567" .

We then take the normalized from and cast it back to JSON-LD with the allocated blank node identifiers

{
    "@context": [ "http://schema.org/",
                  "https://w3id.org/security/v2" ],
    "id": "urn:bnid:_:c14n1",
    "firstName": "Jane",
    "lastName": "Does",
    "jobTitle": "Professor",
    "telephone": "(425) 123-4567",
    "email": "jane.doe@example.com",
    "address": {
      "id": "urn:bnid:_:c14n0",
      "postalAddress": "test"
    },
    { 
    "type": "BbsBlsSignature2020",
    "created": "2020-04-18T05:26:47Z",
    "revealedStatements": [ 1, 3 ],
    "verificationMethod": "did:example:489398593#test",
    "proofPurpose": "assertionMethod",
    "signature":  "BAXSgt7mVHjjpH6H......" 
  } 
}

This means when we are creating a sub-graph we have the allocated identifiers for the formerly blank nodes allocated.

E.g given the above digital signature, the following proof could be derived.

{
    "@context": [ "http://schema.org/",
                  "https://w3id.org/security/v2" ],
    "id": "urn:bnid:_:c14n1",
    "firstName": "Jane",
    "lastName": "Does",
    { 
    "type": "BbsBlsSignatureProof2020",
    "verificationMethod": "did:example:489398593#test",
    "proofPurpose": "assertionMethod",
    "proof":  "BAXSgt7mVHjjpH6H......" ,
    "requiredRevealStatements": [ 1, 3 ],
    "revealedStatements": [ 1, 3 ],
    "totalStatements": 10,
  } 
}

The text was updated successfully, but these errors were encountered:

tplooker · 2020-04-19T04:24:33Z

The downside to the above solution is it effectively mandates the allocating of identifiers to blank nodes, i.e the signature suite would add node identifiers to all blank nodes to the data graph being signed. Which could be problematic for other suites signing the graph that do not have this requirement. It also un-intuitively adds a property to each blank node in the graph which could confuse some developers as to why it is present.

kdenhartog · 2020-04-19T09:38:32Z

An alternative is to provide a normative requirement that an id is supplied for each block. I think I prefer the method you proposed because it handles that for the caller which seems to be a better separation of concerns (returning an error because it can’t canonicalize seems odd). I'll have to think about it a bit more.

tplooker · 2020-04-19T20:02:32Z

The other option is to compute the blank node identifiers when deriving a proof.

Deriving a proof takes in two inputs, the input proof document and the reveal document. Normalizing the input proof document will give us the original set of statements that were signed, including the correct blank node identifiers as the graph is identical to what was signed.

We could elect to on the proof derivation, transform the blank node identifiers to proper node identifiers and return these in the derived proof, then on proof verification, the processing software would need to substitute the transformed blank node identifier (urn:bnid::c14n1) back into a normal blank node identifier (:c14n1) during normalization.

As an example below.

Given the input JSON-LD document to signed

{
    "@context": [ "http://schema.org/",
                  "https://w3id.org/security/v2" ],
    "firstName": "Jane",
    "lastName": "Does",
    "jobTitle": "Professor",
    "telephone": "(425) 123-4567",
    "email": "jane.doe@example.com",
    "address": {
      "postalAddress": "test"
    }
}

Yields

_:c14n0 <http://schema.org/postalAddress> "test" .
_:c14n1 <http://schema.org/address> _:c14n0 .
_:c14n1 <http://schema.org/email> "jane.doe@example.com" .
_:c14n1 <http://schema.org/firstName> "Jane" .
_:c14n1 <http://schema.org/jobTitle> "Professor" .
_:c14n1 <http://schema.org/lastName> "Does" .
_:c14n1 <http://schema.org/telephone> "(425) 123-4567" .

For which we sign the normalized statement as is and return the following

{
    "@context": [ "http://schema.org/",
                  "https://w3id.org/security/v2" ],
    "firstName": "Jane",
    "lastName": "Does",
    "jobTitle": "Professor",
    "telephone": "(425) 123-4567",
    "email": "jane.doe@example.com",
    "address": {
      "postalAddress": "test"
    },
    { 
    "type": "BbsBlsSignature2020",
    "created": "2020-04-18T05:26:47Z",
    "revealedStatements": [ 1, 3 ],
    "verificationMethod": "did:example:489398593#test",
    "proofPurpose": "assertionMethod",
    "signature":  "BAXSgt7mVHjjpH6H......" 
  } 
}

Then when we want to derive a proof, say with the reveal document of

{
    "@context": [ "http://schema.org/",
                  "https://w3id.org/security/v2" ],
    "firstName": "Jane",
    "lastName": "Does"
}

We would normalize the input proof to re-obtain the original normalized form

_:c14n0 <http://schema.org/postalAddress> "test" .
_:c14n1 <http://schema.org/address> _:c14n0 .
_:c14n1 <http://schema.org/email> "jane.doe@example.com" .
_:c14n1 <http://schema.org/firstName> "Jane" .
_:c14n1 <http://schema.org/jobTitle> "Professor" .
_:c14n1 <http://schema.org/lastName> "Does" .
_:c14n1 <http://schema.org/telephone> "(425) 123-4567" .

Diff the obtained statements with the statements we would like to reveal

_:c14n1 <http://schema.org/firstName> "Jane" .
_:c14n1 <http://schema.org/lastName> "Does" .

Transform the blank node identifiers

<urn:bnid:_:c14n1> <http://schema.org/firstName> "Jane" .
<urn:bnid:_:c14n1> <http://schema.org/lastName> "Does" .

Derive the proof and convert back to JSONLD giving the same output as before

{
    "@context": [ "http://schema.org/",
                  "https://w3id.org/security/v2" ],
    "id": "urn:bnid:_:c14n1",
    "firstName": "Jane",
    "lastName": "Does",
    { 
    "type": "BbsBlsSignatureProof2020",
    "verificationMethod": "did:example:489398593#test",
    "proofPurpose": "assertionMethod",
    "proof":  "BAXSgt7mVHjjpH6H......" ,
    "requiredRevealStatements": [ 1, 3 ],
    "revealedStatements": [ 1, 3 ],
    "totalStatements": 10,
  } 
}

But when we verify and have normalized the revealed statements again, e.g

<urn:bnid:_:c14n1> <http://schema.org/firstName> "Jane" .
<urn:bnid:_:c14n1> <http://schema.org/lastName> "Does" .

We would need to process the relevant statements node identifiers back into normal blank node identifiers before verifying the proof e.g

_:c14n1 <http://schema.org/firstName> "Jane" .
_:c14n1 <http://schema.org/lastName> "Does" .

tplooker · 2020-04-19T20:10:17Z

Advantages and disadvantages of the two options

Option 1 (Described in the original issue text)

Advantages
This approach is just essentially disallowing blank node identifiers to be present on graphs that are signed using BBSSignature2020 and using the blank node identifier assigning algorithm for any blank nodes. There is no custom processing on the normalized form down stream i.e in deriving or validating a proof.

Disadvantages
The BBSSignature2020 will enforce the assignment of blank node identifiers which could be problematic if the data graph was originally signed. It also might create developer confusion as to why this special "id" field needs to be present

Option 2

Advantages
No requirement for id in graphs signed with BBSSignature2020, graphs signed with other proof types would still be compatible

Disadvantages
Requires custom processing of the normalized form at the derive and verify proof steps.

OR13 · 2020-04-19T20:17:46Z

I guess option 2 is better, moves the complexity from external to internal, always better to shoulder the burden than expose a quirk to library consumers...

tplooker · 2020-04-19T20:20:15Z

Ok option 2 it is for the time being!

kdenhartog · 2020-04-19T23:54:57Z

probably worth checking that some of the details needed for generating the blank nodes aren't lost/modified when doing proof generation based on the original VC. If this holds such that the verifier doesn't get bad signatures when verifying then I think option 2 is the better route as well.

tplooker · 2020-05-19T21:31:43Z

An alternative proposal suggested by @dlongley is to have a mapping for any BN identifiers in the proof to prevent the need for special skolem URIs (e.g urn:bnid:_:c14n1).

This mapping must map the blank node identifiers allocated to the statements in the canonical form of the derivedProof back to what their blank node identifiers were in the originally signed proof.

Because all blank nodes have a common prefix e.g _:c14n* we essentially only need a map between the numbers assigned to the blank nodes for example mapping from _:c14n1 => _:c14n0 could be shorthanded to 1 => 0.

How we express this in the proof we could do in one of two ways.

An ordered array of signed integers, where the index in the array represents the blank node id in the derived proof and the value is the blank node id in the original proof, this would result in something like the following.

{
    .....data graph for which the proof applies to
    { 
    "type": "BbsBlsSignatureProof2020",
    "verificationMethod": "did:example:489398593#test",
    "proofPurpose": "assertionMethod",
    "proof":  "BAXSgt7mVHjjpH6H......" ,
    "requiredRevealStatements": [ 1, 3 ],
    "blankNodeIdentiferMap": [
         1,
         0,
         2,
         3
    ]
  } 
}

A map, where the key represents the blank node id in the derived proof and the value is the blank node id in the original proof, this would result in something like the following.

{
    .....data graph for which the proof applies to
    { 
    "type": "BbsBlsSignatureProof2020",
    "verificationMethod": "did:example:489398593#test",
    "proofPurpose": "assertionMethod",
    "proof":  "BAXSgt7mVHjjpH6H......" ,
    "requiredRevealStatements": [ 1, 3 ],
    "blankNodeIdentiferMap": {
         "0": 1,
         "1": 0,
         "2": 2,
         "3": 3
    }
  } 
}

@dlongley is this an accurate summary of the idea you had in mind?

dlongley · 2020-05-19T21:50:55Z

@tplooker,

@dlongley, is this an accurate summary of the idea you had in mind?

Yes. I also think that the mapping could be embedded in the proofValue -- allowing for some more compact representation, i.e., it doesn't have to be broken out at the JSON level.

tplooker · 2020-05-19T21:58:06Z

@dlongley Ah interesting that would further simplify the representation, we will review that as an idea

tplooker · 2020-05-19T22:36:49Z

@dlongley did you have any thoughts on how this binary encoding should be done? I.e a custom binary format or should we be using a more formalised method like protocol buffers or message pack?

The simplest custom way I see doing this would be something like the following

proofValue = base64(int16(blankNodeIdentiferMap.length)|int16(blankNodeIdentiferMap[0])..|..int16(blankNodeIdentiferMap[blankNodeIdentiferMap.length])|remainingProofValueOutputedByDeriveProofAPI)

Where | is indicating concatenation blankNodeIdentiferMap is the unsigned int array described in the example above.

dlongley · 2020-05-19T22:56:09Z

My vote for binary representations is CBOR. There is likely to be significant work in the future on CBOR-LD (including representing VCs using that format) and it would be best not to introduce yet another competing format with protocol buffers/etc. So whatever you guys come up with that makes sense in CBOR I would be +1 for.

kdenhartog · 2021-04-21T21:00:24Z

@TimoGlastra said during IIW that he was going to take this issue.

TimoGlastra · 2021-04-23T09:17:30Z

@kdenhartog RE your note in #37. I just discovered you can only assign outside collaborators when they've commented. So here's my comment

OR13 · 2023-05-01T14:30:52Z

Can we fold this into #60? I am going to mark it bending close.

Wind4Greg · 2023-12-11T18:39:23Z

Please see the updated document for mechanism details that address these concerns. See PR #101 for privacy considerations related to both data leakage and unlinkability, including analysis.

tplooker mentioned this issue May 4, 2020

Revealed statements representation #22

Closed

tplooker mentioned this issue May 19, 2020

Updates to proof syntax #30

Merged

kdenhartog added the 2020 Suite Errata label Apr 21, 2021

kdenhartog assigned kdenhartog and TimoGlastra and unassigned kdenhartog Apr 23, 2021

BasileiosKal mentioned this issue Jul 27, 2021

Blank node labels may leak information #60

Closed

OR13 added the pending-close The issue will be closed in 7 days if there are no objections. label May 1, 2023

Wind4Greg removed the 2020 Suite Errata label Dec 11, 2023

Wind4Greg closed this as completed Dec 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The application of blank node identifiers #10

The application of blank node identifiers #10

tplooker commented Apr 18, 2020 •

edited

Loading

tplooker commented Apr 19, 2020 •

edited

Loading

kdenhartog commented Apr 19, 2020 •

edited

Loading

tplooker commented Apr 19, 2020

tplooker commented Apr 19, 2020

OR13 commented Apr 19, 2020

tplooker commented Apr 19, 2020 •

edited

Loading

kdenhartog commented Apr 19, 2020

tplooker commented May 19, 2020 •

edited

Loading

dlongley commented May 19, 2020

tplooker commented May 19, 2020

tplooker commented May 19, 2020 •

edited

Loading

dlongley commented May 19, 2020 •

edited

Loading

kdenhartog commented Apr 21, 2021

TimoGlastra commented Apr 23, 2021

OR13 commented May 1, 2023

Wind4Greg commented Dec 11, 2023

The application of blank node identifiers #10

The application of blank node identifiers #10

Comments

tplooker commented Apr 18, 2020 • edited Loading

tplooker commented Apr 19, 2020 • edited Loading

kdenhartog commented Apr 19, 2020 • edited Loading

tplooker commented Apr 19, 2020

tplooker commented Apr 19, 2020

OR13 commented Apr 19, 2020

tplooker commented Apr 19, 2020 • edited Loading

kdenhartog commented Apr 19, 2020

tplooker commented May 19, 2020 • edited Loading

dlongley commented May 19, 2020

tplooker commented May 19, 2020

tplooker commented May 19, 2020 • edited Loading

dlongley commented May 19, 2020 • edited Loading

kdenhartog commented Apr 21, 2021

TimoGlastra commented Apr 23, 2021

OR13 commented May 1, 2023

Wind4Greg commented Dec 11, 2023

tplooker commented Apr 18, 2020 •

edited

Loading

tplooker commented Apr 19, 2020 •

edited

Loading

kdenhartog commented Apr 19, 2020 •

edited

Loading

tplooker commented Apr 19, 2020 •

edited

Loading

tplooker commented May 19, 2020 •

edited

Loading

tplooker commented May 19, 2020 •

edited

Loading

dlongley commented May 19, 2020 •

edited

Loading