Implement HTTP caching #3231

jeswr · 2024-05-09T12:39:53Z

This would solve...

Performance and robustness issues associated with fetching the same HTTP resource multiple times.

The implementation should look like...

Use an interceptor(?) to implement http caching using the Cache-Control header. I'm inclined to think that the Expires header should not be implemented as http 1.0 has extremely limited use.

Additional context

Originally raised in #3221.

The text was updated successfully, but these errors were encountered:

jeswr · 2024-05-09T14:24:45Z

I've looked into implementing this as an interceptor, and need some information in order to proceed:

Is there existing code for caching and cloning responses in the code base that can be used (for this I was expecting to be able to use logic from the web cache which seems to be implementing the Service Worker Spec)
Is rfc7234 implemented anywhere in the codebase (I cannot find any indication that it is). http-cache-semantics seems to be a good reference implementation of it if not, however it is under the BSD-2-Clause so @kornelski would need to give permission in order for code to be re-used in this library without copying the BSD licence over.

metcoder95 · 2024-05-09T20:08:29Z

Is there existing code for caching and cloning responses in the code base that can be used (for this I was expecting to be able to use logic from the web cache which seems to be implementing the Service Worker Spec)

Can you elaborate on what you have in mind?

Is rfc7234 implemented anywhere in the codebase (I cannot find any indication that it is). http-cache-semantics seems to be a good reference implementation of it if not, however, it is under the BSD-2-Clause so @kornelski would need to permit for code to be re-used in this library without copying the BSD licence over.

The closest implementation will be the fetch one but I don't believe it implements its full RFC-7234 spec, so this will be a new implementation for the interceptor.

Uzlopak · 2024-05-09T20:27:37Z

I have somehow interest to implement it...

jeswr · 2024-05-09T21:49:25Z

Can you elaborate on what you have in mind?

I was trying to work out the easiest way of caching the response object in-memory would be given that I'm assuming it would be coming through with the body as a readable-stream that needs to be cloned - I was assuming that the Web Cache would already need to implement such behavior. I'll step out of discussions of implementation details now if someone else is willing to take over.

I have somehow interest to implement it...

@Uzlopak I'm very time poor so I'm more than happy to step back on my side! Note you'll probably want to implement rfc9111 which appears to obsolete rfc7234.

Uzlopak · 2024-05-09T22:08:59Z

I am also not very time "rich", but I already investigated some hours in reading into rfc 9111.

The only thing which was not clear to me, is how we ensure that we dont open a security issue :D

What I mean is:

In a browser context you have one user and fetch calls can be cached, as the cache is user scoped. In node we dont have users. So a http cache would be globally activated (?) in nodejs.

So we need to have something like setGlobalDispatcher but for caching? setGlobalHttpCache?

mcollina · 2024-05-10T07:30:08Z

I would start having an interceptor or Agent that implement the caching protocol of HTTP, and then worry about how we enable it all the time.

I think it's possible to have it always enabled and not cache any private data: that's how cdn works after all. Before worrying about this, we need the implementation ;).

Uzlopak · 2024-05-10T12:26:40Z

Ok, i will start tonight. ;)

kornelski · 2024-05-10T15:07:10Z

I've implemented http-cache-semantics by reading the RFC start to finish, and translating each paragraph into code.

I highly recommend that approach, because it results in good comments with references to their RFC sections (I wish I kept more of them in my code), and helps ensure that unobvious behaviors are not missed.

Uzlopak · 2024-05-10T20:14:12Z

Usually when I implement an RFC or any other spec I print it out. Then like you say, use the text for commenting the code. When I implemented a paragraph I mark it in the print out that it was implemented, too. Yellow marker for implemented, pink marker for has an issue and should be checked again. When the whole text is colored, then I know the spec is implemented completely ;).

jeswr · 2024-06-26T14:48:47Z

Any updates on this?

metcoder95 · 2024-06-30T09:04:40Z

@Uzlopak?

metcoder95 · 2024-08-16T09:08:04Z

ping

flakey5 · 2024-08-18T22:24:15Z

Hey all, I can start taking a look at implementing this this week. Need to get up to speed on rfc9111 first, but ideally would be able to get a conversation started on some implementation details by end of next week

Also, related: #2760 #2256 #1146

cc @mcollina

ronag · 2024-08-19T18:07:45Z

We have already been working a bit on this. nxtedition/nxt-undici#3

Maybe should sync our efforts?

flakey5 · 2024-08-19T21:51:29Z

Maybe should sync our efforts?

That sgtm, how much progress have y'all made on it?

flakey5 · 2024-08-24T18:25:07Z

Basing off of some ideas discussed here as well as ones in nxtedition/nxt-undici, what do y'all think of this api?

interface CacheInterceptorGlobalOptions {
  // Function for creating the cache key
  makeKey?: (opts: Dispatcher.RequestOptions) => string;
  store?: CacheStore;
}

// Interface responsible for storing the cached responses.
interface CacheStore {
  get(key: string) => Promise<string>;
  put(opts: CachePutOptions) => Promise<void>;
}

type CachePutOptions = {
  key: string;
  value: string;
  size: number;
  vary?: string;
  expires?: number;
  // Subject to change depending on implementation specifics, but the idea is to
  //  provide the cache store with all the info we're given in the cache control
  //  directives
};

Example usage

const { Client } = require('undici')
const cacheInterceptor = require('../lib/interceptor/cache')

const client = new Client('https://google.com')
  .compose(cacheInterceptor({
    makeKey: (opts) => `${opts.origin}:${opts.method}:${opts.path}`,
    store: {
      get: async (key) => {/* ... */},
      set: async (opts) => {/* ... */}
    }
  }))

client.request({ path: '/', method: 'GET' }).then(/* ... */);

Defaults

makeKey and store are optional since I think we should have builtin defaults.

The spec recommends that the cache key is at least the request's uri and method in Section 2.

For the default cache store, I think could use node:sqlite like in nxtedition/nxt-undici#4 with a schema like

CREATE TABLE IF NOT EXISTS cacheInterceptor(
  key TEXT PRIMARY KEY NOT NULL,
  value TEXT NOT NULL,
  vary TEXT,
  size INTEGER,
  expires INTEGER
  -- Subject to change depending on implementation specifics
) STRICT;

CREATE INDEX IF NOT EXISTS idxCacheInterceptorExpires ON cacheInterceptor(expires);

For purging old responses, we can go with the same approach in nxtedition/nxt-undici#4 and purge them in the set call (see here).

mcollina · 2024-08-26T05:01:35Z

Why can't this be simplified for something like:

interface CacheInterceptorGlobalOptions {
  // Function for creating the cache key
  store?: CacheStore;
}

// Interface responsible for storing the cached responses.
interface CacheStore {
  get(key: Dispatcher.RequestOptions) => Promise<??>;
  put(key: Dispatcher.RequestOptions, opts: CachePutOptions) => Promise<void>;
}

type CachePutOptions = {
  value: string;
  size: number;
  vary?: string;
  expires?: number;
  // Subject to change depending on implementation specifics, but the idea is to
  //  provide the cache store with all the info we're given in the cache control
  //  directives
};

I'm unsure we could always streamline the key to a string without overhead

ronag · 2024-08-26T05:06:06Z

I think key can always be {method}:{path}? Unless interacting with a buggy server that doesn't set proper vary headers.

flakey5 · 2024-08-26T06:27:29Z

I think key can always be {method}:{path}? Unless interacting with a buggy server that doesn't set proper vary headers.

It probably will be, but imo it'd still be good to allow it to be overrided if there's some use case that calls for it. It's definitely not needed right now though and can always be added later

metcoder95 · 2024-08-27T06:51:28Z

Depending on the scope of the Cache, we might need to also add origin to the mix (as recommended by spec as well).

ronag · 2024-08-27T07:04:53Z

Depending on the scope of the Cache, we might need to also add origin to the mix (as recommended by spec as well).

I thought in this case the server should respond with a vary header containing host. Adding origin to the key will cause problems (at least for me). Where in the spec does it say this?

mcollina · 2024-08-27T08:58:45Z

I still didn't get an answer why we need to have a cache key as a string. I think we should keep it maximally flexible and pass in the request options.

ronag · 2024-08-27T09:00:40Z

I think we should keep it maximally flexible and pass in the request options.

That won't work though? You can have different request options that have the same result. That's why the spec says you should set the key to method + path and then use the vary header to determine what in the request headers is relevant.

chrisveness · 2024-08-27T09:52:19Z

RFC 9111 Section 2 (https://www.rfc-editor.org/rfc/rfc9111.html#section-2) says

The "cache key" is the information a cache uses to choose a response and is composed from, at a minimum, the request method and target URI used to retrieve the stored response

I can see no mention of ‘method+path’; I would infer the URI must be absolute, as different hosts could provide unrelated responses with the same path. Am I missing something?

I thought the ‘vary’ field cannot have a value which includes the host? According to https://httpwg.org/specs/rfc9110.html#field.vary,

The "Vary" header field in a response describes what parts of a request message, aside from the method and target URI, might have influenced the origin server's process for selecting the content of this response.

Uzlopak · 2024-08-27T09:54:51Z

method+path === "the request method and target URI"

?!

chrisveness · 2024-08-27T10:13:22Z

https://www.ibm.com/docs/en/cics-ts/6.x?topic=concepts-components-url path = the specific resource in the host

ronag · 2024-08-27T10:13:25Z

as different hosts could provide unrelated responses with the same path. Am I missing something?

If different host can provide unrelated responses then they should reply with Vary: host.

I thought the ‘vary’ field cannot have a value which includes the host? According to https://httpwg.org/specs/rfc9110.html#field.vary,

That doesn't say anything about not including host header.

kornelski · 2024-08-27T10:57:57Z

Cache is always split per URL, and Vary only adds additional granularity within the single URL.

(Cache is technically also split per method, but PUT/POST invalidates cached GET, so in practice you can just cache GET)

Varying by Host is a basic guarantee since RFC2616, and never needs any additional reinforcement. I'm not sure if it's forbidden in Vary, but it's certainly useless there.

ronag · 2024-08-27T11:37:40Z

Cache is always split per URL, and Vary only adds additional granularity within the single URL.

(Cache is technically also split per method, but PUT/POST invalidates cached GET, so in practice you can just cache GET)

Varying by Host is a basic guarantee since RFC2616, and never needs any additional reinforcement. I'm not sure if it's forbidden in Vary, but it's certainly useless there.

Any chance you could refer to the relevant parts in the spec?

flakey5 · 2024-08-27T19:50:56Z

I still didn't get an answer why we need to have a cache key as a string. I think we should keep it maximally flexible and pass in the request options.

+1

That won't work though? You can have different request options that have the same result. That's why the spec says you should set the key to method + path and then use the vary header to determine what in the request headers is relevant.

It allows for each cache store to have its own makeKey (or whatever works best for it). The store gets the full context for a request, can make its own cache key based off of it using what it cares about, and then use it for lookups/puts

metcoder95 · 2024-08-27T20:34:52Z

Response (corrected) Vary can and should invalidate cache if this does not match a already cached requests, and a revalidation should be triggered if the criteria changes.

This applies if we want to follow the spec explicitly

+1 on a custom way to generate cache key, only if advising that this will cause the cache to deviate from the spec.

kornelski · 2024-08-27T21:31:01Z

A request not matching existing Vary doesn't invalidate anything. A response with mismatched Vary invalidates the cache.

A good HTTP cache should have two cache levels:

Lookup by the URL (plus method if you choose to cache non-GET) to find if the URL is varying
If it's varying, then lookup responses cached for this URL by a key derived from the request header values listed in the cached Vary response header.

If the second lookup misses, it doesn't invalidate anything yet, only goes to fetch a response. If the server responds with the same Vary, then just add the response to the second-level cache among others (many responses per URL exist).

Only when the server sends a different Vary header (including lack of one), then discard all existing entries for this URL, and start over with this response.

flakey5 · 2024-09-04T22:01:24Z

Just as an update: planning to have at least a draft pr open by this Friday. Not all of the caching directives or optional behaviors will be supported in it just to limit its scope (required ones will be implemented though ofc). Follow up prs will implement more

loynoir · 2024-09-05T00:53:53Z

Would be nice to see node cache similar to web api cache.

https://developer.mozilla.org/en-US/docs/Web/API/Cache/matchAll

metcoder95 · 2024-09-05T08:30:37Z

I’d recommend to prioritize first the interceptor’s work, and we can further evaluate how to setup/port Cache as Web API

Implements bare-bones http caching as per rfc9111 Closes nodejs#3231 Closes nodejs#2760 Closes nodejs#2256 Closes nodejs#1146 Signed-off-by: flakey5 <73616808+flakey5@users.noreply.github.com>

Implements bare-bones http caching as per rfc9111 Closes nodejs#3231 Closes nodejs#2760 Closes nodejs#2256 Closes nodejs#1146 Signed-off-by: flakey5 <73616808+flakey5@users.noreply.github.com> Co-authored-by: Carlos Fuentes <me@metcoder.dev> Co-authored-by: Robert Nagy <ronagy@icloud.com>

Implements bare-bones http caching as per rfc9111 Closes nodejs#3231 Closes nodejs#2760 Closes nodejs#2256 Closes nodejs#1146 Signed-off-by: flakey5 <73616808+flakey5@users.noreply.github.com> Co-authored-by: Carlos Fuentes <me@metcoder.dev> Co-authored-by: Robert Nagy <ronagy@icloud.com> Co-authored-by: Isak Törnros <isak.tornros@hotmail.com>

Implements bare-bones http caching as per rfc9111 Closes nodejs#3231 Closes nodejs#2760 Closes nodejs#2256 Closes nodejs#1146 Co-authored-by: Carlos Fuentes <me@metcoder.dev> Co-authored-by: Robert Nagy <ronagy@icloud.com> Co-authored-by: Isak Törnros <isak.tornros@hotmail.com> Signed-off-by: flakey5 <73616808+flakey5@users.noreply.github.com>

jeswr added the enhancement New feature or request label May 9, 2024

jeswr mentioned this issue May 9, 2024

Only use one network request when identical requests are made #3232

Open

metcoder95 added the interceptors Pull requests or issues related to Dispatcher Interceptors label May 12, 2024

jeswr mentioned this issue May 14, 2024

HTTP caching comunica/comunica#166

Open

flakey5 mentioned this issue Sep 7, 2024

feat: http caching #3562

Merged

7 tasks

SukkaW mentioned this issue Oct 10, 2024

Feat: introduce make-fetch-happen SukkaW/Surge#44

Merged

mcollina closed this as completed in #3562 Oct 14, 2024

mcollina closed this as completed in 2d40ade Oct 14, 2024

Implement HTTP caching #3231

Implement HTTP caching #3231

Comments

jeswr commented May 9, 2024

This would solve...

The implementation should look like...

Additional context

jeswr commented May 9, 2024

metcoder95 commented May 9, 2024

Uzlopak commented May 9, 2024

jeswr commented May 9, 2024 • edited Loading

Uzlopak commented May 9, 2024

mcollina commented May 10, 2024

Uzlopak commented May 10, 2024

kornelski commented May 10, 2024

Uzlopak commented May 10, 2024

jeswr commented Jun 26, 2024

metcoder95 commented Jun 30, 2024

metcoder95 commented Aug 16, 2024

flakey5 commented Aug 18, 2024 • edited Loading

ronag commented Aug 19, 2024

flakey5 commented Aug 19, 2024

flakey5 commented Aug 24, 2024

Example usage

Defaults

mcollina commented Aug 26, 2024

ronag commented Aug 26, 2024

flakey5 commented Aug 26, 2024

metcoder95 commented Aug 27, 2024

ronag commented Aug 27, 2024 • edited Loading

mcollina commented Aug 27, 2024

ronag commented Aug 27, 2024

chrisveness commented Aug 27, 2024

Uzlopak commented Aug 27, 2024

chrisveness commented Aug 27, 2024

ronag commented Aug 27, 2024 • edited Loading

kornelski commented Aug 27, 2024 • edited Loading

ronag commented Aug 27, 2024

flakey5 commented Aug 27, 2024

metcoder95 commented Aug 27, 2024 • edited Loading

kornelski commented Aug 27, 2024

flakey5 commented Sep 4, 2024

loynoir commented Sep 5, 2024

metcoder95 commented Sep 5, 2024

jeswr commented May 9, 2024 •

edited

Loading

flakey5 commented Aug 18, 2024 •

edited

Loading

ronag commented Aug 27, 2024 •

edited

Loading

ronag commented Aug 27, 2024 •

edited

Loading

kornelski commented Aug 27, 2024 •

edited

Loading

metcoder95 commented Aug 27, 2024 •

edited

Loading