Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backend fetch http #1706

Merged
merged 21 commits into from
Mar 25, 2024
Merged

Backend fetch http #1706

merged 21 commits into from
Mar 25, 2024

Conversation

lastmjs
Copy link
Member

@lastmjs lastmjs commented Mar 12, 2024

The purpose of this PR is to provide general-purpose http capabilities from fetch. The developer should not need to understand that they are on ICP to do basic GET and POST requests. These are currently the only types of requests that are supported. I'm not sure how important it is to provide nice errors if the user tries to use unsupported methods, headers, etc...they will still get an error by the system at least for the methods I would guess.

They shouldn't have to do anything ICP-specific as I said, unfortunately there are some settings on http requests (the management canister is the underlying mechanism to actually perform these requests) that the developer must set. So we have come up with a global method on ic to set all of these settings. They can call this before making http requests to set those settings.

If they don't call this, they will use default settings that could be very expensive and perhaps not calibrated correctly. I believe I have the default settings pretty good for most use cases, as in it should work and not throw errors.

Generally they should set the global settings. We would like to push DFINITY/ICP to remove the need to specify these extra settings.

I have some basic tests, but maybe now or maybe in another PR it would probably be good to add a more robust set of tests.

This is a basic implementation, I'm not sure we should go for the entire fetch API right now, but the happy path. For example on the frontend fetch we already decided not to implement accepting the Request object, we might want to do the same here.

@bdemann bdemann mentioned this pull request Mar 12, 2024
@bdemann bdemann changed the base branch from main to tfjs March 13, 2024 16:39
@bdemann
Copy link
Member

bdemann commented Mar 13, 2024

What is the feature?

Adding fetch http to the backend

Does it have an issue?

No. Looks like its a bullet point of #1691

What is it doing?

Explanation

I would expect this to be pretty similar to the front end, in that we should
need to figure out that it's just a plain http request and pass it into the
original fetch? No that's not right, on the backend we don't have an original
fetch. We have to do all of the http calls through the management canister.
So I would expect this to just be an adapter for the http protocol in fetch to
the ic.call.

So all of those weird header things that we were worrying about we will also
have to worry about here. So I guess before really diving into this I ought to
familiarize myself with fetch and the http protocol it implements.

Questions

  • What is the api for the management canister http request?
  • How does that translate from the fetch?
    • Request object?
    • Options?
    • How are headers represented by fetch?
      • Headers object, or an object literal with string values.
  • Do they have the same return type?
    • If not how do they translate?
  • Are there things that fetch supports that the management canister does not
    • visa versa
  • How do we handle the cycles cost?
  • What does it mean that it will return an HTTP response, possibly after a
    transformation. What on earth is that transformation and why do we need it?
  • What can we learn from the interface spec?
  • How are we going to handle all of the other http methods besides GET, POST, and HEAD?
  • What is the max_response_bytes?
  • What is the transform?
  • Why would you want to specify a max less than the 2MB max?
    • if its to decrease the cycles cost why isn't that calculated
      automatically?
      • Yes the price will be based off this value.
      • So why isn't that calculated automatically based off the actual
        response bytes?
  • Why is the developer responsible for sending the cycles amount in the call?
    • what if they don't know the price?
    • what happens if they don't do enough?
      • does it fail?
      • if it fails do they loose all of those cycles?
      • I'm just imagining a situation where it cost 1 more cycle then what they specified and they loose it all of lack of one more cycle.
  • Do we need to handle the 8192 url max? Or will we just let rust handle that
    for us when we pass it the url?
    • Same question but for the limit of 64 headers
    • Same question but for 8 KiB limit on header names and values
    • Same question but for the 48 KiB limit on all total header names and values
    • Same question but for IPv4 addresses
  • The transform function must be exported by the canister. How do we handle that
    in azle?
  • Cycles to pay for the call must be explicitly transferred with the call,
    i.e., they are not deducted from the caller's balance implicitly (e.g., as for
    inter-canister calls).
    • why not?
  • Is fetch still experimental in node.js? It was in 2022?
    • No, in Node 21 global fetch is now stable
  • What are these options
    • What is subresource integrity value??
    • What is the request mode?
    • All of the other ones I don't understand?

Answers

What is the api for http requests?

Under the hood we will be calling the management canister through the rust api.
Under the management canister it has the
http_request that we will be using. It has some structs and enums
that will come into play later I'm guessing. Lets start with http_request (I
don't think we will be using http_request_with_closure at all).

Basically http_request makes an http request. Pretty straight forward. Let's see
The CanisterHttpRequestArgument is probably going to be what we need to look at.
We should also be aware of the cycles cost for this call. I have added a question
about how we are going to address that.

I am also told to look at the interface spec for more information.

We will take a look at that later.

CanisterHttpRequestArgument

This has

  • a string url, that's fine
  • max_response_bytes, what is that?
  • method, I'm guessing that's an Enum with POST, GET, etc. Hmm it is but it only has POST, GET, and HEAD...
  • A list of headers
  • an optional body of bytes
  • and a transform, what is that?

I am hoping that all of that will make more sense after reading the interface spec, so let's jump to that now.

What can we learn from the Interface Spec

The canister should ait to issue idempotent requests. I think that's on our
developers and not on us, so I'm going to disregard that. Although that is a
troubling limitation that I hadn't thought of before...

Ah, so the transform function is used to sanitize a response so that things
like timestamps that would otherwise make two identical requests have different
responses can be removed. Makes sense, but also makes me further concerned by
the limitations of making http requests from a canister. Making them to a
canister doesn't seem too bad mind you, but if your server relies on another
server... that makes me nervous...

Oh, and the canister will only have access to the request after it's sanitized,
which I guess makes sense because if you have a replicated function working on
different date than it's other replicated parts then that would mess with the
replication.

The spec owns up to only supported GET, HEAD, and POST methods. With the
implication that that is only how it is right now and there might be support in
the future.

For security reasons only https is supported.

The max size for a request or response is 2_000_000B. That makes sense, it's
the same as the message size limit. You can configure the size of the response
to be less than 2 MB but I still don't know why you would want to do that. I
think I remember that Jordan mentioned that the price will be linked to that
size. But why not just have the price be calculated based on what is actually
returned?

Cool, I got a lot more questions from that... Let's keep plugging along on those

What does the fetch api look like for the http protocol?

In a simple example it looks like this

const response = await fetch(
    "http://url.domain",
    {
        method: "POST",
        headers: {
            "My-Header-Name": "My-Header-Value"
        },
        body: [1, 2, 3, 4, 5, 6, 7]
    }
);

But I'm guessing I'm going to need more that just a little easy start up demo
to make sure we are adequately implement the api.

I recently learned that undici is the
project the implement the global fetch for node. Let's see if they have any
documentation that could help inform this. Secifically they have a common API
methods

section in their readme that will probably be helpful.

For right now I would expect us only to implement the most common API features.
And even then I don't know if it is expedient for us to do all of those right
away as our highest priority.

Even more specifically they have a section about
fetch
which has links to mozilla developer
docs
and the whatwg
specs
for fetch. That's probably
an even better resource to check out. Let's move over there.

Syntax
fetch(resource)
fetch(resource, options)

resource is either a url (a string or anything with a stringifier (included URL) that will resolve to a url essentially) or a Request object.

And then options is an object with a bunch of options.

If I can just look through what the Request object is and briefly cover all of
the options I think we'll be set. I'm guessing there is going to be a fair
amount of overlap.

Request Object

It has 14 properties.

  • body
  • if the body was used in a request yet
  • cache mode (default, reload, no-cache)?
  • credentials?
  • destination (or content type)?
  • headers
  • subresource integrity value??
  • method (POST, GET, etc)
  • mode (cors, no-cors, same-origin, navigate)?
  • mode for how redirects are handled?
  • referrer?
  • referrer policy?
  • signal?
  • url
  • Those not part of options
    • bodyUsed
    • destination
    • url

I think most of those are pretty straight forward (except the ones with ?)...
So most of them aren't...

I guess I would expect our code to be able to parse all of that and add it to
the http_request appropriately, if applicable

Options
  • That are part of the request body
    • body
    • cache
    • credentials
    • headers
    • integrity
    • method
    • mode
    • redirect
    • referrer
    • referrerPolicy
    • signal
  • That are not
    • browsing topics
    • keep alive?
    • priority

At this point I am feeling a little overwhelmed with each of these. So I am
going to put them aside for now and look at each one as we get to it in the
review.

Expectations

Tests

  • Do I want to see each option represented in at least one test?
  • I might want to see the tests for each of ic.call and icp protocol and http
    protocol
  • I want to see a test with a url string URL object and a Request object.
  • I want to see a test with a transform function
  • I want to see a test with each of the three allowed methods
  • Representatives of the different header approaches
  • Maybe I want to see test that hit the url length max and header maxes

Code

Code should

  • have a good interface for adding a transform function
  • should alert the user if they try and use something besides GET, HEAD, and POST
    • either through a warning, or error. Probably an error and throw I don't
      see how you guess what the right header to use is. I guess I would expect us
      to leverage typescript to have a type that will warn them before compilation
      that it's only those three headers that are allowed.
  • only allow https calls
  • export the transform function. Or require the developer to do so
  • Implement a good selection the most common api features of node's global fetch
  • Be able to handle a request object
  • Be able to handle all of the options
    • Even if we don't support them I would expect them to be handled
  • Handle both headers in a header object and object with string literals
  • some way to handle cycles costs and max response size. Based on design
    discussions I am expecting that they will be optional fields which if left blank
    will charge the max. Which after reading the ic spec I agree is the correct move
    since it is the move that they themselves make.
  • I expect the code or documentation should enforce or inform the developer that
    their transform function needs to be exported
  • Maybe there is handling for the 8192 url max
    • Same expectation for the limit of 64 headers
    • Same expectation for 8 KiB limit on header names and values
    • Same expectation for the 48 KiB limit on all total header names and values
    • Same expectation for IPv4 addresses

@bdemann
Copy link
Member

bdemann commented Mar 13, 2024

Responses

I almost forgot to look at responses
The response for fetch is going to be made like this

new Response()
new Response(body)
new Response(body, options)

body is either null (by default) or one of

  • blob
  • Array Buffer
  • Typed Array
  • DataView
  • Form Data
  • ReadableStream
  • URL Search Params
  • String
  • string literal

options are

  • status (default is 200)
  • status text (such as "OK" default is "")
  • headers (by default is empty, is (like request) made of either a Headers object or object literal with string key/value pairs).

In rust for http_request the response looks like
a body of just bytes
a list of Headers
a Nat for status

Headers are going to be like this

pub struct HttpHeader {
    pub name: String,
    pub value: String,
}

I would expect the code to also properly covert from rust response to http response.

I think I would expect it to always be a blob body, with a Headers object, and
a status.

Copy link
Member

@bdemann bdemann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Results

Rubric

  1. Consistency (Same as the rest of the code and itself)
    • There are a handful of new any types.
  2. Readability (Easy to understand)
    • return value for the http protocol is a little hard to read at first glance
      • could be more declarative
    • azleFetch is starting to get pretty long, could we break it up into smaller functions
      • slitting along protocol lines seems like a good natural breaking point
    • I like the updates to getBalance and getBlockByNumber in the ethereum json rpc example
  3. Maintainability (modular, reusable, and simple)
    • I like the reuse of azleFetch in the http protocol implementation
  4. Correctness (as bug free as reasonable)
    • Discussed in detail below
  5. Code level implications (code coverage, error coverage, conditionals coverage, efficiency, etc)
    • Discussed in detail below
  6. Quality of tests and documentation (thinking about how others will use it)
    • Discussed in detail below

Code

Lets start with responses so we don't forget them again. Responses are not
quite what I was expecting, I have added a comment to that effect. Overall
I do like the http implementation, just using the icp protocol to call the
management canister as we had been doing before. I guess on that note we used
to have responses from azleFetch that were never like a Response object and it
seemed to work then ¯\_(ツ)_/¯

I like the api for adding a transform function. I'm just a little nervous about
how to best teach people what a transform method is, why they need it, and
making sure that they have added it as a query or update method.

There was a TODO to address unsupported methods. I think that should be pretty
easy to add and would be beneficial. Or at least as beneficial as the odds of
someone doing a method besides POST or GET.

It allows http calls. It shouldn't

Exporting transform functions

Knowing what I now do about transform functions the developer must provide one
if one is needed. That is not something we can do. Hopefully they don't need
one, but what happens if they do and they don't provide one, or they make one,
but they don't add it as a query method?

Implement a good selection the most common api features of node's global fetch

I wrote this one before diving into the mozilla developer docs. I'm not sure it
actually applies like I thought it would when I wrote it. because the protocol
is pretty straight forward. We would just need to make sure we cover all of the
options. Or that we can pass the buck to the management canister for not
supported all of the options.

What options do we support?
It looks like almost none of them. The common ones were

  • body
  • cache
  • credentials
  • headers
  • integrity
  • method
  • mode
  • redirect
  • referrer
  • referrerPolicy
  • signal

We don't currently support URL or Request objects for the fetch input, but I
think those would be very easy to add support for, especially given that we just
ignore the options that aren't body, headers, and method.

Tests

  • Do I want to see each option represented in at least one test?
    • Right now options aren't supported so I won't expect these until we add
      them
  • I might want to see the tests for each of ic.call and icp protocol and http
    protocol
    • Actually I don't think we need this because http is calling the icp
      protocol so leaving it as is should provide this coverage.
  • I want to see a test with a url string URL object and a Request object.
    • I think we should support this and add tests for it.
  • I want to see a test with a transform function
    • We already have that in both of the examples that this pr touches.
  • I want to see a test with each of the three allowed methods
    • We have GET and POST tests, but no HEAD tests.
  • Representatives of the different header approaches
    • We don't have any tests for headers and consequently we didn't catch
      that they are being completely disregarded
  • Maybe I want to see test that hit the url length max and header maxes
    • We don't currently enforce those limits, if we don't then it's probably
      not important, and would be on dfinity to test those, if we do then we
      should test them.

Misc Concerns

It seems like doing a canister side fetch call is going to very quickly start to
require some deep IC knowledge.
- Transform functions (which brings query/update functions (which brings candid)), cycle costs, replication and idempotent requests

src/lib/fetch.ts Outdated Show resolved Hide resolved
src/lib/fetch.ts Outdated Show resolved Hide resolved
src/lib/fetch.ts Outdated Show resolved Hide resolved
src/lib/fetch.ts Outdated Show resolved Hide resolved
src/lib/fetch.ts Outdated Show resolved Hide resolved
examples/ethereum_json_rpc/src/index.ts Show resolved Hide resolved
src/lib/fetch.ts Outdated Show resolved Hide resolved
src/lib/fetch.ts Outdated Show resolved Hide resolved
src/lib/fetch.ts Outdated Show resolved Hide resolved
src/lib/fetch.ts Outdated Show resolved Hide resolved
@bdemann bdemann changed the base branch from tfjs to main March 13, 2024 21:49
@lastmjs lastmjs force-pushed the backend_fetch_http branch 2 times, most recently from 34514e6 to f814835 Compare March 21, 2024 04:30
@lastmjs lastmjs mentioned this pull request Mar 22, 2024
17 tasks
@lastmjs lastmjs merged commit 0f3484e into main Mar 25, 2024
120 of 122 checks passed
@lastmjs lastmjs deleted the backend_fetch_http branch March 29, 2024 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants