Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify the HTTP Gateway protocol #20

Merged
merged 14 commits into from
May 3, 2022
3 changes: 1 addition & 2 deletions default.nix
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,7 @@ rec {
-R $PWD -D $out/$doc_path/ index.adoc
find . -type f -name '*.png' | cpio -pdm $out/$doc_path/
cp *.cddl $out/$doc_path
cp ic.did $out/$doc_path
cp *.did $out/$doc_path
mkdir -p $out/nix-support
echo "report spec $out/$doc_path index.html" >> $out/nix-support/hydra-build-products
Expand Down
41 changes: 41 additions & 0 deletions spec/http-gateway.did
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
type HeaderField = record { text; text; };

type HttpRequest = record {
method: text;
url: text;
headers: vec HeaderField;
body: blob;
};

type HttpResponse = record {
status_code: nat16;
headers: vec HeaderField;
body: blob;
upgrade : opt bool;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: upgrade -> update

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately(?) this field is called upgrade, as in “upgrade the call from query to an update call”. Very confusing… oh well.

See https://github.com/dfinity/icx-proxy/pull/6/files#diff-42cb6807ad74b3e201c5a7ca98b911c5fa08380e942be6e4ac5807f8377f87fcR259

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably to late to change, but would promote be a better name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably. Or resend_as_update_call. Or, in line with the pattern for the streaming callback, it could even pass a func ref.

In a way it's not too late: we can change this while keeping backward conpat in the implementations (there is only one so far) relatively easily (the proxy can simply keep both in it's internal expected type). I just wonder if we should do it with this PR, or maybe as a separate step, to get this in first.

streaming_strategy: opt StreamingStrategy;
};

// Each canister that uses the streaming feature gets to choose their concrete
// type; the HTTP Gateway will treat it as an opaque value that is only fed to
// the callback method

type StreamingToken = /* application-specific type */


type StreamingCallbackHttpResponse = record {
body: blob;
token: opt StreamingToken;
};

type StreamingStrategy = variant {
Callback: record {
callback: func (StreamingToken) -> (opt StreamingCallbackHttpResponse) query;
token: StreamingToken;
};
};

service : {
http_request: (request: HttpRequest) -> (HttpResponse) query;
http_request_update: (request: HttpRequest) -> (HttpResponse);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be clearer to make reference to the streaming callback here, but I understand why you didn't. (It's optional and also dynamically specified in HttpResponse.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name is arbitrary, right? So I wouldn't quite know how to include it.


140 changes: 140 additions & 0 deletions spec/index.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -1949,6 +1949,146 @@ lookup_path(["d"], pruned_tree) = Found "morning"
lookup_path(["e"], pruned_tree) = Absent
....

[#http-gateway]
== The HTTP Gateway protocol

This section specifies the _HTTP Gateway protocol_, which allows canisters to handle conventional HTTP requests.

This feature involves the help of a _HTTP Gateway_ that translates between HTTP requests and the IC protocol. Such a gateway could be a stand-alone proxy, it could be implemented in a web browsers (natively, via plugin or via a service worker) or in other ways. This document describes the interface and semantics of this protocol independent of a concrete Gateway, so that all Gateway implementations can be compatible.

Conceptually, this protocol builds on top of the interface specified in the remainder of this document, and therefore is an “application-level” interface, not a feature of the core Internet Computer system described in the other sections, and could be a separate document. We nevertheless include this protocol in the Internet Computer Interface Specification because of its important role in the ecosystem and due to the importance of keeping multiple Gateway implementations in sync.

=== Overview

A HTTP request by an HTTP client is handled by these steps:

1. The Gateway resolves the Host of the request to a canister id.
2. The Gateway Candid-encodes the HTTP request data.
3. The Gateway invokes the canister via a query call to `http_request`.
4. The canister handles the request and returns a HTTP response, encoded in Candid, together with additional metadata.
5. If requested by the canister, the Gateway sends the request again via an update call to `http_request_update`.
6. If applicable, the Gateway fetches further body data via streaming query calls.
7. If applicable, the Gateway validates the certificate of the response.
8. The Gateway sends the response to the HTTP client.

[#http-gateway-interface]
=== Candid interface

The following interface description, in https://github.com/dfinity/candid/blob/master/spec/Candid.md[Candid syntax], describes the expected Canister interface. You can also link:{attachmentsdir}/http-gateway.did[download the file].
----
include::{example}http-gateway.did[]
----

Only canisters that use the “Upgrade to update calls” feature need to provide the `http_request_update` method.

NOTE: Canisters not using these features can completely leave out the `streaming_strategy` and/or `upgrade` fields in the `HttpResponse` they return, due to how Candid subtyping works. This might simplify their code.

[#http-gateway-name-resolution]
=== Canister resolution

The Gateway needs to know the canister id of the canister to talk to, and obtains that information from the hostname as follows:

1. Check that the hostname, taken from the `Host` field of the HTTP request, is of the form `<name>.raw.ic0.app` or `<name>.ic0.app`, or fail.

2. If the `<name>` is in the following table, use the given canister ids:
+
.Canister hostname resolution
|============================================
| Hostname | Canister id
| `identity` | `rdmx6-jaaaa-aaaaa-aaadq-cai`
| `nns` | `qoctq-giaaa-aaaaa-aaaea-cai`
| `dscvr` | `h5aet-waaaa-aaaab-qaamq-cai`
| `personhood` | `g3wsl-eqaaa-aaaan-aaaaa-cai`
|============================================

3. Else, if `<name>` is a valid textual encoding of a principal, use that principal as the canister id.

4. Else fail.

If the hostname was of the form `<name>.ic0.app`, it is a _safe_ hostname; if it was of the form `<name>.raw.ic0.app` it is a _raw_ hostname.

=== Request encoding

The HTTP request is encoded into the `HttpRequest` Candid structure.

* The `method` field contains the HTTP method (e.g. `HTTP`), in upper case.

* The `url` field contains the URL from the HTTP request line, i.e. without protocol or hostname, and including query parameters.

* The `headers` field contains the headers of the HTTP request.

* The `body` field contains the body of the HTTP request (without any content encodings processed by the Gateway).

=== Upgrade to update calls

If the canister sets `update = opt true` in the `HttpResponse` reply from `http_request`, then the Gateway ignores all other fields of the reply. The Gateway performs an _update_ call to `http_request_update`, passing the same `HttpRequest` record as the argument, and uses that response instead.

The value of the `update` field returned from `http_request_update` is ignored.

=== Response decoding

The Gateway assembles the HTTP response from the given `HttpResponse` record:

* The HTTP response status code is taken from the `status_code` field.

* The HTTP response headers are taken from the `headers` field.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the boundary node mangling or filtering the headers in any way that should be noted here?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The filtering we have is in dfinity/ic/ic-os/boundary-guestos/rootfs/etc/nginx/conf.d/001-ic-nginx.conf.

The short of it is we add these response headers (and thus their implications) to every HTTP call response:

access-control-allow-origin: *
access-control-allow-methods: GET, POST, HEAD, OPTIONS
access-control-allow-headers: DNT,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Range,Cookie
access-control-expose-headers: Content-Length,Content-Range
x-cache-status: MISS

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the question is: Why set these? Shouldn’t the canister have control over them?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The short answer is "it's complicated". The longer answer is "because headers are not yet certified". As such we currently limit headers a bit to improve security. Once we get certified headers, we should not have any more need for this limitation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, good enough for this PR I guess.

+
NOTE: Not all Gateway implementations may be able to pass on all forms of headers. In particular, Service Workers are unable to pass on the `Set-Cookie` header.
+
[NOTE]
====
HTTP Gateways may add additional headers. In particular, the following headers may be set:

....
access-control-allow-origin: *
access-control-allow-methods: GET, POST, HEAD, OPTIONS
access-control-allow-headers: DNT,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Range,Cookie
access-control-expose-headers: Content-Length,Content-Range
x-cache-status: MISS
....
====

* The HTTP response body is initialized with the value of the `body` field, and further assembled as per the <<http-gateway-streaming,streaming protocol>>.

[#http-gateway-streaming]
=== Response body streaming

The HTTP Gateway protocol has provisions to transfer further chunks of the body data from the canister to the HTTP Gateway, to overcome the message limit of the Internet Computer. This streaming protocol is independent of any possible streaming of data between the HTTP Gateway and the HTTP client. The gateway may assemble the response in whole before passing it on, or pass the chunks on directly, on the TCP or HTTP level, as it sees fit. When the Gateway is <<http-gateway-certification,certifying the response>>, it must not pass on uncertified chunks.

If the `streaming_strategy` field of the `HttpResponse` is set, the HTTP Gateway then uses further query calls to obtain further chunks to append to the body:

1. If the function reference in the `callback` field of the `streaming_strategy` is not a method of the given canister, the Gateway fails the request.

2. Else, it makes a query call to the given method, passing the `token` value given in the `streaming_strategy` as the argument.

3. That query method returns a `StreamingCallbackHttpResponse`. The `body` therein is appended to the body of the HTTP response. This is repeated as long as the method returns some token in the `token` field, until that field is `null`.
nomeata marked this conversation as resolved.
Show resolved Hide resolved

WARNING: The type of the `token` value is chosen by the canister; the HTTP Gateway obtains the Candid type of the encoded message from the canister, and uses it when passing the token back to the canister. This generic use of Candid is not covered by the Candid specification, and may not be possible in some cases (e.g. when using “future types”). Canister authors may have to use “simple” types.


[#http-gateway-certification]
=== Response certification
nomeata marked this conversation as resolved.
Show resolved Hide resolved

If the hostname was safe, the HTTP Gateway performs _certificate validation_:

1. It searches for a response header called `Ic-Certificate` (case-insensitive).

2. The value of the header must be a structured header according to RFC 8941 with fields `certificate` and `tree`, both being byte sequences.

3. The `certificate` must be a valid certificate as per <<certification>>, signed by the root key. If the certificate contains a subnet delegation, the delegation must be valid for the given canister. The timestamp in `/time` must be recent. The subnet state tree in the certificate must reveal the canister’s <<state-tree-certified-data,certified data>>.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently neither icx-proxy nor the service worker check the certificate /time (which is bad). Can you give some more specific measure of "recent"? 5 minutes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That’s probably for the security team to decide. 5 min should be ample (until we get into edge node caching, then things become tricky, because a cache is hardly different from an attacker serving old data).


4. The `tree` must be a hash tree as per <<certification-encoding>>.

5. The root hash of that `tree` must match the canister’s certified data.

6. The path `["http_assets",<url>]`, where `url` is the utf8-encoded `url` from the `HttpRequest` must exist and be a leaf. Else, if it does not exist, `["http_assets","/index.html"]` must exist and be a leaf.

7. That leaf must contain the SHA-256 hash of the _decoded_ body.
+
The decoded body is the body of the HTTP response (in particular, after assembling streaming chunks), decoded according to the `Content-Encoding` header, if present. Supported encodings for `Content-Encoding` are `gzip` and `deflate.`

WARNING: The certification protocol only covers the mapping from request URL to response body. It completely ignores the request method and headers, and does not cover the response headers and status code.


[#abstract-behavior]
== Abstract behavior
Expand Down