Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPIP-0421: HTTP Delegated Routing Reader Privacy Upgrade #421

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

ischasny
Copy link

@ischasny ischasny commented Jun 7, 2023

@ischasny ischasny requested a review from a team as a code owner June 7, 2023 13:18
@ischasny ischasny changed the title docs: IPIP for Delegated Routing Privacy Upgrade IPIP-0421: IPIP for Delegated Routing Privacy Upgrade Jun 7, 2023
Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for submitting this @ischasny.
I'm OoO for the rest of the week, but dropping some quick questions after very rushed first pass (fine to link to prior discussions, if these were already asked before).

src/ipips/ipip-0421.md Show resolved Hide resolved
src/ipips/ipip-0421.md Outdated Show resolved Hide resolved
src/routing/http-routing-reader-privacy-v1.md Outdated Show resolved Hide resolved
src/routing/http-routing-reader-privacy-v1.md Outdated Show resolved Hide resolved
Comment on lines 47 to 48
- **`MH`** is the [Multihash](https://github.com/multiformats/multihash) contained in a `CID`. It corresponds to the
digest of a hash function over some content. `MH` is represented as a 32-byte array.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean CIDs with longer hash functions are truncated at 32-byte mark?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the 32-byte length as indeed it can be longer.

src/ipips/ipip-0421.md Show resolved Hide resolved
src/routing/http-routing-reader-privacy-v1.md Outdated Show resolved Hide resolved
src/routing/http-routing-reader-privacy-v1.md Outdated Show resolved Hide resolved
src/routing/http-routing-reader-privacy-v1.md Outdated Show resolved Hide resolved
src/routing/http-routing-reader-privacy-v1.md Outdated Show resolved Hide resolved
@lidel lidel changed the title IPIP-0421: IPIP for Delegated Routing Privacy Upgrade IPIP-0421: HTTP Delegated Routing Reader Privacy Upgrade Jun 7, 2023
ischasny and others added 3 commits June 8, 2023 10:22
Co-authored-by: Marcin Rataj <lidel@lidel.org>
Co-authored-by: Marcin Rataj <lidel@lidel.org>
@guillaumemichel
Copy link
Contributor

LGTM!

src/ipips/ipip-0421.md Outdated Show resolved Hide resolved
src/ipips/ipip-0421.md Outdated Show resolved Hide resolved
src/ipips/ipip-0421.md Outdated Show resolved Hide resolved
src/ipips/ipip-0421.md Outdated Show resolved Hide resolved
src/routing/http-routing-reader-privacy-v1.md Outdated Show resolved Hide resolved
src/ipips/ipip-0421.md Outdated Show resolved Hide resolved
src/ipips/ipip-0421.md Outdated Show resolved Hide resolved
src/ipips/ipip-0421.md Outdated Show resolved Hide resolved
src/ipips/ipip-0421.md Outdated Show resolved Hide resolved
src/ipips/ipip-0421.md Outdated Show resolved Hide resolved
masih added 4 commits July 12, 2023 15:46
* Refine paragraphs for better readability.
* Change section on router selection based on code, since from multihash
code alone we cannot determine wheterh whether the request is encrypted
or not.
* Update alternatives section to explain how the IPIP can be enhanced
with OHTTP and Tor.
Refine routing specification and add byte frame diagram to clearly
illustrate the content of SALT values.
@masih masih requested review from lidel and aschmahmann July 12, 2023 16:21

#### Backwards Compatibility

Users will need to deliberately activate Reader Privacy on their nodes. A new flag could be introduced into IPFS implementations such as Kubo's HTTP Delegated Content Router configuration to streamline this process. Users on older nodes can continue using the existing API and switch on Reader Privacy later.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd hope this doesn't need to be the case in an application that has some IPFS smarts (rather than a simple HTTP client). If enough features are expressed through something like #388 then the client should be able to have plausible defaults here (e.g. if my delegated router supports IPNI + DHT, but only IPNI has double-hashing support and the client can run its own DHT client it could choose to send double-hashed requests to the delegated router for IPNI and do the DHT lookups itself).

Obviously some clients will still offer configurability (e.g. would you rather ask the delegated router to do DHT lookups for you in cleartext, or not do them at all) but having reasonable default behavior should be possible.


All salts below are 64-bytes long and represent a string padded with `\x00`.

- `SALT_DOUBLEHASH`: The string value `CR_DOUBLEHASH`, where each if the 13 characters are represented by their byte value. The remainder of the 64 bytes is filled with null bytes represented by `\x00`. This results in 51 null bytes after the `CR_DOUBLEHASH` string. The following illustrates its corresponding byte frame diagram:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `SALT_DOUBLEHASH`: The string value `CR_DOUBLEHASH`, where each if the 13 characters are represented by their byte value. The remainder of the 64 bytes is filled with null bytes represented by `\x00`. This results in 51 null bytes after the `CR_DOUBLEHASH` string. The following illustrates its corresponding byte frame diagram:
- `SALT_DOUBLEHASH`: The string value `CR_DOUBLEHASH`, where each of the 13 characters are represented by their byte value. The remainder of the 64 bytes is filled with null bytes represented by `\x00`. This results in 51 null bytes after the `CR_DOUBLEHASH` string. The following illustrates its corresponding byte frame diagram:

43 52 5F 44 4F 55 42 4C 45 48 41 53 48 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
```

- `SALT_ENCRYPTIONKEY`: The string value `CR_ENCRYPTIONKEY`, where each if the 15 characters are represented by their byte value. The remainder of the 64 bytes is filled with null bytes represented by `\x00`. This results in 49 null bytes after the `CR_ENCRYPTIONKEY` string. The following illustrates its corresponding byte frame diagram:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `SALT_ENCRYPTIONKEY`: The string value `CR_ENCRYPTIONKEY`, where each if the 15 characters are represented by their byte value. The remainder of the 64 bytes is filled with null bytes represented by `\x00`. This results in 49 null bytes after the `CR_ENCRYPTIONKEY` string. The following illustrates its corresponding byte frame diagram:
- `SALT_ENCRYPTIONKEY`: The string value `CR_ENCRYPTIONKEY`, where each of the 16 characters are represented by their byte value. The remainder of the 64 bytes is filled with null bytes represented by `\x00`. This results in 48 null bytes after the `CR_ENCRYPTIONKEY` string. The following illustrates its corresponding byte frame diagram:

43 52 5F 45 4E 43 52 59 50 54 49 4F 4E 4B 45 59 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
```

These magic values are utilized to compute distinct digests from identical values for varying purposes. For instance, a hash of a Multihash employed for lookups should differ from the one used for key derivation, despite originating from the same value. To achieve this, the Multihash is concatenated with different magic values before applying the hash function: `SALT_DOUBLEHASH` for lookups and `SALT_ENCRYPTIONKEY` for key derivation as elaborated in the `Glossary`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not strictly needed, but might be nice to explain a bit why the hashes should be different for when people coming looking at this later.

- **`CID`** stands for [Content IDentifier](https://github.com/multiformats/cid).
- **`MH`** refers to the [Multihash](https://github.com/multiformats/multihash) contained in a `CID`. It corresponds to the hash function's digest over certain content.
- **`HASH2`** is a second hash over the multihash. Second Hashes must follow the `Multihash` format with `SHA2_256` codec. The digest must be calculated as `hash(SALT_DOUBLEHASH || MH)`.
- **`ProviderRecord`** is a JSON with Provider Record as described in the [HTTP Delegated Routing Specification](http-routing-v1.md).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
- **`ProviderRecord`** is a JSON with Provider Record as described in the [HTTP Delegated Routing Specification](http-routing-v1.md).
- **`ProviderRecord`** is a JSON object with Provider Record as described in the [HTTP Delegated Routing Specification](http-routing-v1.md).


- `EncProviderRecordKeys` is a list of base64 encoded `EncProviderRecordKey`;

#### `GET /routing/v1/encrypted/metadata/{HashProviderRecordKey}`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question about encoding as for HASH2

- **`ProviderRecord`** is a JSON with Provider Record as described in the [HTTP Delegated Routing Specification](http-routing-v1.md).
- **`ProviderRecordKey`** is a concatenation of `peerID || contextID`. Explicit encoding lengths are unnecessary as they are inherently encoded as part of the multihash format. Max `contextID` length is 64 bytes.
- **`EncProviderRecordKey`** is `Nonce || enc(deriveKey(multihash), Nonce, ProviderRecordKey)`. Max `EncProviderRecordKey` is 200 bytes.
- **`HashProviderRecordKey`** is a hash over `ProviderRecordKey`, calculated as `hash(SALT_DOUBLEHASH || ProviderRecordKey)`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a SHA256 multihash as well?

- **`CID`** stands for [Content IDentifier](https://github.com/multiformats/cid).
- **`MH`** refers to the [Multihash](https://github.com/multiformats/multihash) contained in a `CID`. It corresponds to the hash function's digest over certain content.
- **`HASH2`** is a second hash over the multihash. Second Hashes must follow the `Multihash` format with `SHA2_256` codec. The digest must be calculated as `hash(SALT_DOUBLEHASH || MH)`.
- **`ProviderRecord`** is a JSON with Provider Record as described in the [HTTP Delegated Routing Specification](http-routing-v1.md).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it? The routing-v1 spec allows for opaque blobs in the provider record. Where's the line between "metadata" and "provider record" here?

Comment on lines +91 to +97
```json
{
"EncProviderRecordKeys": [
"EBxdYDhd.....",
"IOknr9DK....."
]
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be helpful/illustrative for the encrypted data to also show what it's expected the unencrypted form would look like.


### Notes

Assembling a full `ProviderRecord` from the encrypted data requires multiple server roundtrips. The first fetches a list of `EncProviderRecordKey`s, followed by one for each `EncProviderRecordKey` to retrieve `EncMetadata`. To minimize the number of roundtrips to one, the client implementation should use the local libp2p peerstore for multiaddress discovery and [libp2p multistream select](https://github.com/multiformats/multistream-select) for protocol negotiation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@masih we've talked about this before and it's more exploratory, but I'm noting here to make it public and see if folks have any thoughts about how this might impact the interface/API here in the future.

Note: This is not a "please rewrite everything" request.


How well this works is a function of how important the metadata is to performing a useful retrieval, and how important the metadata is depends on the distribution of information between the "ProviderRecord" and the "(ProviderRecord)Metadata".

IIUC the reason it's implemented this way is to keep data storage in routing backends like IPNI from needing to store the same data but encrypted many many times (i.e. once per multihash advertised).

At the extremes we have:

  • A lot of space could be saved by making ProviderRecord just contain a small pointer and Metadata contain all the actual provider record information (e.g. peerID, multiaddrs, protocols, ...). This means two round-trips unless the client already has the pointer information locally (e.g. if the pointer was a peerID, then having the multiaddrs, etc. locally, and if the pointer was some system-specific unique-ID then having that cached from a prior lookup). Also, unless there's some aggregation service/proxy it allows correlation between many different requests that use the same metadata (not necessarily a big deal here).
  • The second round-trip could disappear if we store all the information in the ProviderRecord portion. However, this means encrypting all the data for every advertised multihash

I could situationally see reasons to shuffle data between these two, depending on things like:

  1. how reusable the metadata is
  2. how frequently the metadata information is to be cached
  3. cost models for routing systems
    ...

Two areas where I could see the extremes being in use:

  1. Save storage by just returning the target's "identifier": e.g. libp2p peerID or an unauthenticated HTTP+libp2p URL (that has .well_known for protocol negotiation) since peer routing makes sense to be separable in libp2p, and protocol negotiation to be separate for unauthenticated HTTP + libp2p.
  2. Save round-trips by returning all the information: e.g. a webseeds-like advertisement that points to an outboard blake3 HTTP URL and an HTTP URL for the data (data could be a separate advertisement)

This makes me wonder if there's a better way to do this. For example:

  • encrypted/providers returns a (JSON) blob mimicking the routing-v1 results
  • It contains some information indicating if there's metadata and/or what might be in the metadata
  • encrypted/metadata provides the metadata

This could allow systems like IPNI to optimize data layouts on ingestion and have some flexibility without breaking downstream clients.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: 🔎 In Review
Development

Successfully merging this pull request may close these issues.

6 participants