-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IPIP-337: Delegated Content Routing HTTP API #337
Changes from 26 commits
1d9ec9c
65d178b
4c024dd
0acdb01
f7b4437
13d695c
e3e744a
451b1e9
27d23e8
a9984a9
fce070f
fff68c3
11f4ca5
39c467e
87ff0ac
96d55d0
4264a2d
0f49dcf
7238e63
e823d9e
19fff93
1aac44c
acc397b
325ca1e
9c47a31
512bc05
655b1f2
d343189
573417e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,116 @@ | ||
# IPIP-337: Delegated Content Routing HTTP API | ||
|
||
- Start Date: 2022-10-18 | ||
- Related Issues: | ||
- https://github.com/ipfs/specs/pull/337 | ||
|
||
## Summary | ||
|
||
This IPIP specifies an HTTP API for delegated content routing. | ||
|
||
## Motivation | ||
|
||
Idiomatic and first-class HTTP support for delegated routing is an important requirement for large content routing providers, | ||
and supporting large content providers is a key strategy for driving down IPFS content routing latency. | ||
These providers must handle high volumes of traffic and support many users, so leveraging industry-standard tools and services | ||
such as HTTP load balancers, CDNs, reverse proxies, etc. is a requirement. | ||
To maximize compatibility with standard tools, IPFS needs an HTTP API specification that uses standard HTTP idioms and payload encoding. | ||
The [Reframe spec](https://github.com/ipfs/specs/blob/main/reframe/REFRAME_PROTOCOL.md) for delegated content routing is an experimental attempt at this, | ||
but it has resulted in a very unidiomatic HTTP API which is difficult to implement and is incompatible with many existing tools. | ||
The cost of a proper redesign, implementation, and maintenance of Reframe and its implementation is too high relative to the urgency of having a delegated content routing HTTP API. | ||
|
||
Note that this does not supplant nor deprecate Reframe. Ideally in the future, Reframe and its implementation would receive the resources needed to map the IDL to idiomatic HTTP, | ||
and implementations of this spec could then be rewritten in the IDL, maintaining backwards compatibility. | ||
|
||
We expect this API to be extended beyond "content routing" in the future, so additional IPIPs may rename this to something more general such as "Delegated Routing HTTP API". | ||
|
||
## Detailed design | ||
|
||
See the [Delegated Content Routing HTTP API spec](../routing/DELEGATED_CONTENT_ROUTING_HTTP.md) included with this IPIP. | ||
|
||
## Design rationale | ||
|
||
To understand the design rationale, it is important to consider the concrete Reframe limitations that we know about: | ||
|
||
- Reframe [method types](../reframe/REFRAME_KNOWN_METHODS.md) using the HTTP transport are encoded inside IPLD-encoded messages | ||
- This prevents URL-based pattern matching on methods, which makes it hard and expensive to do basic HTTP scaling and optimizations: | ||
- Configuring different caching strategies for different methods | ||
- Configuring reverse proxies on a per-method basis | ||
- Routing methods to specific backends | ||
- Method-specific reverse proxy config such as timeouts | ||
- Developer UX is poor as a result, e.g. for CDN caching you must encode the entire request message and pass it as a query parameter | ||
- This was initially done by URL-escaping the raw bytes | ||
- Not possible to consume correctly using standard JavaScript (see [edelweiss#61](https://github.com/ipld/edelweiss/issues/61)) | ||
- Shipped in Kubo 0.16 | ||
- Packing a CID into a struct, encoding it with DAG-CBOR, multibase-encoding that, percent-encoding that, and then passing it in a URL, rather than merely passing the CID in the URL, is needlessly complex from a user's perspective, and has already made it difficult to manually construct requests or interpret logs | ||
- Added complexity of "Cacheable" methods supporting both POSTs and GETs | ||
- The required streaming support and message groups add a lot of implementation complexity, but streaming does not currently work for cachable methods sent over HTTP | ||
- Ex for FindProviders, the response is buffered anyway for ETag calculation | ||
- There are no limits on response sizes nor ways to impose limits and paginate | ||
- This is useful for routers that have highly variable resolution time, to send results as soon as possible, but this is not a use case we are focusing on right now and we can add it later | ||
- The Identify method is not implemented because it is not currently useful | ||
guseggert marked this conversation as resolved.
Show resolved
Hide resolved
guseggert marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- This is because Reframe's ambition is to be a generic catch-all bag of methods across protocols, while delegated routing use case only requires a subset of its methods. | ||
- Client and server implementations are difficult to write correctly, because of the non-standard wire formats and conventions | ||
guseggert marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- Example: [bug reported by implementer](https://github.com/ipld/edelweiss/issues/62), and [another one](https://github.com/ipld/edelweiss/issues/61) | ||
- The Go implementation is [complex](https://github.com/ipfs/go-delegated-routing/blob/main/gen/proto/proto_edelweiss.go) and [brittle](https://github.com/ipfs/go-delegated-routing/blame/main/client/provide.go#L51-L100), and is currently maintained by IPFS Stewards who are already over-committed with other priorities | ||
- Only the HTTP transport has been designed and implemented, so it's unclear if the existing design will work for other transports, and what their use cases and requirements are | ||
guseggert marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- This means Reframe can't be trusted to be transport-agnostic until there is at least a second transport implemented (e.g. as a reframe-over-libp2p protocol) | ||
- There's naming confusion around "Reframe, the protocol" and "Reframe, the set of methods" | ||
|
||
So this API proposal makes the following changes: | ||
guseggert marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
- The Delegated Content Routing API is defined using HTTP semantics, and can be implemented without introducing Reframe concepts nor IPLD | ||
- There is a clear distinction between the RPC protocol (HTTP) and the API (Deleged Content Routing) | ||
- "Method names" and cache-relevant parameters are pushed into the URL path | ||
- Streaming support is removed, and default response size limits are added along with an optional `pageLimit` parameter for clients to specify response sizes | ||
- We will add streaming support in a subsequent IPIP, but we are trying to minimize the scope of this IPIP to what is immediately useful | ||
- Bodies are encoded using idiomatic JSON, instead of using IPLD codecs, and are compatible with OpenAPI specifications | ||
- The JSON uses human-readable string encodings of common data types | ||
- CIDs are encoded as CIDv1 strings with a multibase prefix (e.g. base32), for consistency with CLIs, browsers, and [gateway URLs](https://docs.ipfs.io/how-to/address-ipfs-on-web/) | ||
- Multiaddrs use the [human-readable format](https://github.com/multiformats/multiaddr#specification) that is used in existing tools and Kubo CLI commands such as `ipfs id` or `ipfs swarm peers` | ||
- Byte array values, such as signatures, are multibase-encoded strings (with an `m` prefix indicating Base64) | ||
- The "Identify" method and "message groups" are not included | ||
- The "GetIPNS" and "PutIPNS" methods are not included | ||
guseggert marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
### User benefit | ||
|
||
The cost of building and operating content routing services will be much lower, as developers will be able to maximally reuse existing industry-standard tooling. | ||
Users will not need to learn a new RPC protocol and tooling to consume or expose the API. | ||
This will result in more content routing providers, each providing a better experience for users, driving down content routing latency across the IPFS network | ||
and increasing data availability. | ||
|
||
### Compatibility | ||
|
||
#### Backwards Compatibility | ||
|
||
IPFS Stewards will implement this API in [go-delegated-routing](https://github.com/ipfs/go-delegated-routing), using breaking changes in a new minor version. | ||
Because the existing Reframe spec can't be safely used in JavaScript and we won't be investing time and resources into changing the wire format implemented in edelweiss to fix it, | ||
the experimental support for Reframe in Kubo will be deprecated in the next release and delegated content routing will subsequently use this HTTP API. | ||
We may decide to re-add Reframe support in the future once these issues have been resolved.- | ||
|
||
#### Forwards Compatibility | ||
|
||
Standard HTTP mechanisms for forward compatibility are used: | ||
- The API is versioned using a version number prefix in the path | ||
- The `Accept` and `Content-Type` headers are used for content type negotiation, allowing for backwards-compatible additions of new MIME types, hypothetically such as: | ||
- `application/cbor` for binary-encoded responses | ||
- `application/x-ndjson` for streamed responses | ||
- `application/octet-stream` if the content router can provide the content/block directly | ||
- New paths+methods can be introduced in a backwards-compatible way | ||
- Parameters can be added using either new query parameters or new fields in the request/response body. | ||
- Provider records are both opaque and versioned to allow evolution of schemas and semantics for the same transfer protocol | ||
|
||
As a proof-of-concept, the tests for the initial implementation of this HTTP API were successfully tested with a libp2p transport using [libp2p/go-libp2p-http](https://github.com/libp2p/go-libp2p-http), demonstrating viability for also using this API over libp2p. | ||
|
||
### Security | ||
|
||
- TODO: cover user privacy | ||
- TODO: parsing best practices: what are limits (e.g., per message / field)? | ||
|
||
### Alternatives | ||
|
||
- Reframe (general-purpose RPC) was evaluated, see "Design rationale" section for rationale why it was not selected. | ||
|
||
### Copyright | ||
|
||
Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/). |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,184 @@ | ||
# Delegated Content Routing HTTP API | ||
|
||
![wip](https://img.shields.io/badge/status-wip-orange.svg?style=flat-square) Delegated Content Routing HTTP API | ||
lidel marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
**Author(s)**: | ||
|
||
- Gus Eggert | ||
|
||
**Maintainer(s)**: | ||
|
||
* * * | ||
|
||
**Abstract** | ||
|
||
"Delegated content routing" is a mechanism for IPFS implementations to use for offloading content routing to another process/server. This spec describes an HTTP API for delegated content routing. | ||
|
||
## API Specification | ||
|
||
The Delegated Content Routing Routing HTTP API uses the `application/json` content type by default. | ||
|
||
As such, human-readable encodings of types are preferred. This spec may be updated in the future with a compact `application/cbor` encoding, in which case compact encodings of the various types would be used. | ||
|
||
## Common Data Types | ||
|
||
- CIDs are always string-encoded using a [multibase](https://github.com/multiformats/multibase)-encoded [CIDv1](https://github.com/multiformats/cid#cidv1). | ||
- Multiaddrs are string-encoded according to the [human-readable multiaddr specification](https://github.com/multiformats/multiaddr#specification) | ||
- Peer IDs are string-encoded according [PeerID string representation specification](https://github.com/libp2p/specs/blob/master/peer-ids/peer-ids.md#string-representation) | ||
- Multibase bytes are string-encoded according to [the Multibase spec](https://github.com/multiformats/multibase), and *should* use base64. | ||
- Timestamps are Unix millisecond epoch timestamps | ||
|
||
Until required for business logic, servers should treat these types as opaque strings, and should preserve unknown JSON fields. | ||
|
||
### Versioning | ||
|
||
This API uses a standard version prefix in the path, such as `/v1/...`. If a backwards-incompatible change must be made, then the version number should be increased. | ||
|
||
### Provider Records | ||
|
||
A provider record contains information about a content provider, including the transfer protocol and any protocol-specific information useful for fetching the content from the provider. | ||
|
||
The information required to write a record to a router (*"write" provider records*) may be different than the information contained when reading provider records (*"read" provider records*). | ||
|
||
For example, indexers may require a signature in `bitswap` write records for authentication of the peer contained in the record, but the read records may not include this authentication information. | ||
|
||
Both read and write provider records have a minimal required schema as follows: | ||
|
||
```json | ||
{ | ||
"Protocol": "<transfer_protocol_name>", | ||
"Schema": "<transfer_protocol_schema>", | ||
Comment on lines
+49
to
+50
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What do we do for "unidentified libp2p protocol, maybe" which is what we get from the DHT? The DHT returns a set of peerIDs (and sometimes addresses) for a given peer but does not guarantee anything about the data transfer protocol supported (if any). Note: that provider records are used for IPNS-over-PubSub. So a protocol name could be invented for that, but in any event provider records are in use today for more than just Bitswap anyway so defining "Bitswap" as "some libp2p protocol" doesn't seem like a particularly good idea. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Continued in #377 |
||
... | ||
} | ||
``` | ||
|
||
Where: | ||
|
||
- `Protocol` is the multicodec name of the transfer protocol | ||
lidel marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- `Schema` denotes the schema to use for encoding/decoding the record | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How does the use of Schema interact with systems like IPNI supporting arbitrary user protocols without requiring PRs and spec changes to IPNI and deployments like cid.contact? For example, IMO it's reasonable that IPNI nodes should be able to have some code that looks roughly like:
Is there some generic schema label that's supposed to be used for opaque blobs? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Continued in #377 |
||
- This is separate from the `Protocol` to allow this HTTP API to evolve independently of the transfer protocol | ||
- Implementations should switch on this when parsing records, not on `Protocol` | ||
- `...` denotes opaque JSON, which may contain information specific to the transfer protocol | ||
|
||
Specifications for some transfer protocols are provided in the "Transfer Protocols" section. | ||
|
||
## API | ||
guseggert marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
### `GET /routing/v1/providers/{CID}` | ||
|
||
- Response codes | ||
- `200`: the response body contains 0 or more records | ||
- `404`: must be returned if no matching records are found | ||
- `422`: request does not conform to schema or semantic constraints | ||
- Response Body | ||
|
||
```json | ||
{ | ||
"Providers": [ | ||
{ | ||
"Protocol": "<protocol_name>", | ||
"Schema": "<schema>", | ||
... | ||
} | ||
] | ||
} | ||
``` | ||
|
||
- Default limit: 100 providers | ||
- Optional query parameters | ||
guseggert marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- `transfer` only return providers who support the passed transfer protocols, expressed as a comma-separated list of transfer protocol names such as `transfer=bitswap,filecoin-graphsync-v1` | ||
guseggert marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just want to flag that
Feels like gatekeeping system which makes it impossible to experiment with novel transfer protocols. Last year I suggested to not invent own strings, avoid transport protocol codes from Since If we want to keep it, need to ensure we are not gatekeeping: Option AEnsure clients can talk the same version of bitswap by reusing strings from libp2p identify's protocols list: $ ipfs id | jq .Protocols
[
"/ipfs/bitswap",
"/ipfs/bitswap/1.0.0",
"/ipfs/bitswap/1.1.0",
"/ipfs/bitswap/1.2.0",
"/libp2p/fetch/0.0.1",
...
] We could update spec and say that every value that starts with
Option BAlternative, is to just accept number, and allow people to use codes from reserved private range There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IMO using the numbers from the global table feels pretty bad for experimentation as well. The existing IPNI implementation doesn't support them and they require some gatekeeping anyway. For the time being I've put up ipni/specs#6. It seems like we could reasonably integrate this with something like the
Unfortunately, this type of query isn't supported in IPNI today and if it did it'd probably be with some specific semantics (e.g. adding a field to the "BitswapMetadata" section that's a list of protocolIDs) or a similar thing for GraphSync, unless they're going to reserve a new number every time they modify the protocol). If we started leveraging named-record in addition to (or instead of) numbers in the global table we could just query against those. Basically, this would mean that the query could be fulfilled using custom logic per-number, or just using the
Since the general policy is not to merge specs without implementations we should remove and then re-add later with the implementation, right? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
IPNI imposes no restriction on the protocol ID in metadata; this can be any number and treated as arbitrary bytes by the indexers.
Happy to capture an issue on this if this is needed?
It'd probably be varint prefix matching.
i'm sorry i am struggling to see how using names instead of numbers would make a difference in this case. Surely we can do the same with numbers? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the consensus is to remove this, but the idea was that these are treated by this spec as opaque strings used for filtering, and the servers have no requirement to enforce any particular values (unless they want/need to). But happy to have this conversation another day, since we never implemented this anyway. |
||
- `transport` for provider records with multiaddrs, only return records with multiaddrs explicitly supporting the passed transport protocols, encoded as decimal multicodec codes such as `transport=460,478` (`/quic` and `/tls/ws` respectively) | ||
guseggert marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- Implements pagination according to the Pagination section | ||
|
||
Each object in the `Providers` list is a *read provider record*. | ||
|
||
## Pagination | ||
|
||
APIs that return collections of results should support pagination as follows: | ||
|
||
- If there are more results, then a `NextPageToken` field should include an opaque string value, otherwise it should be undefined | ||
- The value of `NextPageToken` can be specified as the value of a `pageToken` query parameter to fetch the next page | ||
- Character set is restricted to the regular expression `[a-zA-Z0-9-_.~]+`, since this is intended to be used in URLs | ||
- The client continues this process until `NextPageToken` is undefined or doesn't care to continue | ||
- A `pageLimit` query parameter specifies the maximum size of a single page | ||
guseggert marked this conversation as resolved.
Show resolved
Hide resolved
guseggert marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
lidel marked this conversation as resolved.
Show resolved
Hide resolved
|
||
### Implementation Notes | ||
|
||
Servers are required to return *at most* `pageLimit` results in a page. It is recommended for pages to be as dense as possible, but it is acceptable for them to return any number of items in the closed interval [0, pageLimit]. This is dependent on the capabilities of the backing database implementation. | ||
For example, a query specifying a `transfer` filter for a rare transfer protocol should not *require* the server to perform a very expensive database query for a single request. Instead, this is left to the server implementation to decide based on the constraints of the database. | ||
|
||
Implementations should encode into the token whatever information is necessary for fetching the next page. This could be a base32-encoded JSON object like `{"offset":3,"limit":10}`, an object ID of the last scanned item, etc. | ||
lidel marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## Error Codes | ||
|
||
- `501`: must be returned if a method/path is not supported | ||
- `429`: may be returned to indicate to the caller that it is issuing requests too quickly | ||
- `400`: must be returned if an unknown path is requested | ||
lidel marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## CORS and Web Browsers | ||
|
||
Browser interoperability requires implementations to support | ||
[CORS](https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS). | ||
|
||
JavaScript client running on a third-party Origin must be able to send HTTP | ||
request to the endpoints defined in this specification, and read the received | ||
values. This means HTTP server implementing this API must (1) support | ||
[CORS preflight requests](https://developer.mozilla.org/en-US/docs/Glossary/Preflight_request) | ||
sent as HTTP OPTIONS, and (2) always respond with headers that remove CORS | ||
limits, allowing every site to query the API for results: | ||
|
||
```plaintext | ||
Access-Control-Allow-Origin: * | ||
Access-Control-Allow-Methods: GET, OPTIONS | ||
``` | ||
|
||
## Known Transfer Protocols | ||
|
||
This section contains a non-exhaustive list of known transfer protocols (by name) that may be supported by clients and servers. | ||
|
||
### Bitswap | ||
|
||
Multicodec name: `transport-bitswap` | ||
Schema: `bitswap` | ||
|
||
guseggert marked this conversation as resolved.
Show resolved
Hide resolved
|
||
#### Bitswap Read Provider Records | ||
|
||
```json | ||
{ | ||
"Protocol": "transport-bitswap", | ||
"Schema": "bitswap", | ||
"ID": "12D3K...", | ||
"Addrs": ["/ip4/..."] | ||
} | ||
``` | ||
|
||
- `ID`: the [Peer ID](https://github.com/libp2p/specs/blob/master/peer-ids/peer-ids.md) to contact | ||
- `Addrs`: a list of known multiaddrs for the peer | ||
- This list may be incomplete or incorrect and should only be treated as *hints* to improve performance by avoiding extra peer lookups | ||
|
||
The server should respect a passed `transport` query parameter by filtering against the `Addrs` list. | ||
|
||
### Filecoin Graphsync | ||
guseggert marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Multicodec name: `transport-graphsync-filecoinv1` | ||
Schema: `graphsync-filecoinv1` | ||
|
||
#### Filecoin Graphsync Read Provider Records | ||
|
||
```json | ||
{ | ||
"Protocol": "transport-graphsync-filecoinv1", | ||
"Schema": "graphsync-filecoinv1", | ||
"ID": "12D3K...", | ||
"Addrs": ["/ip4/..."], | ||
"PieceCID": "<cid>", | ||
"VerifiedDeal": true, | ||
"FastRetrieval": true | ||
} | ||
guseggert marked this conversation as resolved.
Show resolved
Hide resolved
|
||
``` | ||
|
||
- `ID`: the peer ID of the provider | ||
- `Addrs`: a list of known multiaddrs for the provider | ||
- `PieceCID`: the CID of the [piece](https://spec.filecoin.io/systems/filecoin_files/piece/#section-systems.filecoin_files.piece) within which the data is stored | ||
- `VerifiedDeal`: whether the deal corresponding to the data is verified | ||
- `FastRetrieval`: whether the provider claims there is an unsealed copy of the data available for fast retrieval |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just write the spec within the IDL of the data and define the transport to be this? It seems like it'd be easy enough except for the areas where the divergence of this API runs counter to some of the Reframe goals, which seem worth discussing. For example, I put an alternative that seems to capture some of your major changes below.