Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement HTTP caching #3231

Closed
jeswr opened this issue May 9, 2024 · 35 comments · Fixed by #3562
Closed

Implement HTTP caching #3231

jeswr opened this issue May 9, 2024 · 35 comments · Fixed by #3562
Labels
enhancement New feature or request interceptors Pull requests or issues related to Dispatcher Interceptors

Comments

@jeswr
Copy link

jeswr commented May 9, 2024

This would solve...

Performance and robustness issues associated with fetching the same HTTP resource multiple times.

The implementation should look like...

Use an interceptor(?) to implement http caching using the Cache-Control header. I'm inclined to think that the Expires header should not be implemented as http 1.0 has extremely limited use.

Additional context

Originally raised in #3221.

@jeswr
Copy link
Author

jeswr commented May 9, 2024

I've looked into implementing this as an interceptor, and need some information in order to proceed:

  • Is there existing code for caching and cloning responses in the code base that can be used (for this I was expecting to be able to use logic from the web cache which seems to be implementing the Service Worker Spec)
  • Is rfc7234 implemented anywhere in the codebase (I cannot find any indication that it is). http-cache-semantics seems to be a good reference implementation of it if not, however it is under the BSD-2-Clause so @kornelski would need to give permission in order for code to be re-used in this library without copying the BSD licence over.

@metcoder95
Copy link
Member

Is there existing code for caching and cloning responses in the code base that can be used (for this I was expecting to be able to use logic from the web cache which seems to be implementing the Service Worker Spec)

Can you elaborate on what you have in mind?

Is rfc7234 implemented anywhere in the codebase (I cannot find any indication that it is). http-cache-semantics seems to be a good reference implementation of it if not, however, it is under the BSD-2-Clause so @kornelski would need to permit for code to be re-used in this library without copying the BSD licence over.

The closest implementation will be the fetch one but I don't believe it implements its full RFC-7234 spec, so this will be a new implementation for the interceptor.

@Uzlopak
Copy link
Contributor

Uzlopak commented May 9, 2024

I have somehow interest to implement it...

@jeswr
Copy link
Author

jeswr commented May 9, 2024

Can you elaborate on what you have in mind?

I was trying to work out the easiest way of caching the response object in-memory would be given that I'm assuming it would be coming through with the body as a readable-stream that needs to be cloned - I was assuming that the Web Cache would already need to implement such behavior. I'll step out of discussions of implementation details now if someone else is willing to take over.

I have somehow interest to implement it...

@Uzlopak I'm very time poor so I'm more than happy to step back on my side! Note you'll probably want to implement rfc9111 which appears to obsolete rfc7234.

@Uzlopak
Copy link
Contributor

Uzlopak commented May 9, 2024

I am also not very time "rich", but I already investigated some hours in reading into rfc 9111.

The only thing which was not clear to me, is how we ensure that we dont open a security issue :D

What I mean is:

In a browser context you have one user and fetch calls can be cached, as the cache is user scoped. In node we dont have users. So a http cache would be globally activated (?) in nodejs.

So we need to have something like setGlobalDispatcher but for caching? setGlobalHttpCache?

@mcollina
Copy link
Member

I would start having an interceptor or Agent that implement the caching protocol of HTTP, and then worry about how we enable it all the time.

I think it's possible to have it always enabled and not cache any private data: that's how cdn works after all. Before worrying about this, we need the implementation ;).

@Uzlopak
Copy link
Contributor

Uzlopak commented May 10, 2024

Ok, i will start tonight. ;)

@kornelski
Copy link

I've implemented http-cache-semantics by reading the RFC start to finish, and translating each paragraph into code.

I highly recommend that approach, because it results in good comments with references to their RFC sections (I wish I kept more of them in my code), and helps ensure that unobvious behaviors are not missed.

@Uzlopak
Copy link
Contributor

Uzlopak commented May 10, 2024

Usually when I implement an RFC or any other spec I print it out. Then like you say, use the text for commenting the code. When I implemented a paragraph I mark it in the print out that it was implemented, too. Yellow marker for implemented, pink marker for has an issue and should be checked again. When the whole text is colored, then I know the spec is implemented completely ;).

@metcoder95 metcoder95 added the interceptors Pull requests or issues related to Dispatcher Interceptors label May 12, 2024
@jeswr
Copy link
Author

jeswr commented Jun 26, 2024

Any updates on this?

@metcoder95
Copy link
Member

@Uzlopak?

@metcoder95
Copy link
Member

ping

@flakey5
Copy link
Member

flakey5 commented Aug 18, 2024

Hey all, I can start taking a look at implementing this this week. Need to get up to speed on rfc9111 first, but ideally would be able to get a conversation started on some implementation details by end of next week

Also, related: #2760 #2256 #1146

cc @mcollina

@ronag
Copy link
Member

ronag commented Aug 19, 2024

We have already been working a bit on this. nxtedition/nxt-undici#3

Maybe should sync our efforts?

@flakey5
Copy link
Member

flakey5 commented Aug 19, 2024

Maybe should sync our efforts?

That sgtm, how much progress have y'all made on it?

@flakey5
Copy link
Member

flakey5 commented Aug 24, 2024

Basing off of some ideas discussed here as well as ones in nxtedition/nxt-undici, what do y'all think of this api?

interface CacheInterceptorGlobalOptions {
  // Function for creating the cache key
  makeKey?: (opts: Dispatcher.RequestOptions) => string;
  store?: CacheStore;
}

// Interface responsible for storing the cached responses.
interface CacheStore {
  get(key: string) => Promise<string>;
  put(opts: CachePutOptions) => Promise<void>;
}

type CachePutOptions = {
  key: string;
  value: string;
  size: number;
  vary?: string;
  expires?: number;
  // Subject to change depending on implementation specifics, but the idea is to
  //  provide the cache store with all the info we're given in the cache control
  //  directives
};

Example usage

const { Client } = require('undici')
const cacheInterceptor = require('../lib/interceptor/cache')

const client = new Client('https://google.com')
  .compose(cacheInterceptor({
    makeKey: (opts) => `${opts.origin}:${opts.method}:${opts.path}`,
    store: {
      get: async (key) => {/* ... */},
      set: async (opts) => {/* ... */}
    }
  }))

client.request({ path: '/', method: 'GET' }).then(/* ... */);

Defaults

makeKey and store are optional since I think we should have builtin defaults.

The spec recommends that the cache key is at least the request's uri and method in Section 2.

For the default cache store, I think could use node:sqlite like in nxtedition/nxt-undici#4 with a schema like

CREATE TABLE IF NOT EXISTS cacheInterceptor(
  key TEXT PRIMARY KEY NOT NULL,
  value TEXT NOT NULL,
  vary TEXT,
  size INTEGER,
  expires INTEGER
  -- Subject to change depending on implementation specifics
) STRICT;

CREATE INDEX IF NOT EXISTS idxCacheInterceptorExpires ON cacheInterceptor(expires);

For purging old responses, we can go with the same approach in nxtedition/nxt-undici#4 and purge them in the set call (see here).

@mcollina
Copy link
Member

Why can't this be simplified for something like:

interface CacheInterceptorGlobalOptions {
  // Function for creating the cache key
  store?: CacheStore;
}

// Interface responsible for storing the cached responses.
interface CacheStore {
  get(key: Dispatcher.RequestOptions) => Promise<??>;
  put(key: Dispatcher.RequestOptions, opts: CachePutOptions) => Promise<void>;
}

type CachePutOptions = {
  value: string;
  size: number;
  vary?: string;
  expires?: number;
  // Subject to change depending on implementation specifics, but the idea is to
  //  provide the cache store with all the info we're given in the cache control
  //  directives
};

I'm unsure we could always streamline the key to a string without overhead

@ronag
Copy link
Member

ronag commented Aug 26, 2024

I think key can always be {method}:{path}? Unless interacting with a buggy server that doesn't set proper vary headers.

@flakey5
Copy link
Member

flakey5 commented Aug 26, 2024

I think key can always be {method}:{path}? Unless interacting with a buggy server that doesn't set proper vary headers.

It probably will be, but imo it'd still be good to allow it to be overrided if there's some use case that calls for it. It's definitely not needed right now though and can always be added later

@metcoder95
Copy link
Member

Depending on the scope of the Cache, we might need to also add origin to the mix (as recommended by spec as well).

@ronag
Copy link
Member

ronag commented Aug 27, 2024

Depending on the scope of the Cache, we might need to also add origin to the mix (as recommended by spec as well).

I thought in this case the server should respond with a vary header containing host. Adding origin to the key will cause problems (at least for me). Where in the spec does it say this?

@mcollina
Copy link
Member

I still didn't get an answer why we need to have a cache key as a string. I think we should keep it maximally flexible and pass in the request options.

@ronag
Copy link
Member

ronag commented Aug 27, 2024

I think we should keep it maximally flexible and pass in the request options.

That won't work though? You can have different request options that have the same result. That's why the spec says you should set the key to method + path and then use the vary header to determine what in the request headers is relevant.

@chrisveness
Copy link

RFC 9111 Section 2 (https://www.rfc-editor.org/rfc/rfc9111.html#section-2) says

The "cache key" is the information a cache uses to choose a response and is composed from, at a minimum, the request method and target URI used to retrieve the stored response

I can see no mention of ‘method+path’; I would infer the URI must be absolute, as different hosts could provide unrelated responses with the same path. Am I missing something?

I thought the ‘vary’ field cannot have a value which includes the host? According to https://httpwg.org/specs/rfc9110.html#field.vary,

The "Vary" header field in a response describes what parts of a request message, aside from the method and target URI, might have influenced the origin server's process for selecting the content of this response.

@Uzlopak
Copy link
Contributor

Uzlopak commented Aug 27, 2024

method+path === "the request method and target URI"

?!

@chrisveness
Copy link

https://www.ibm.com/docs/en/cics-ts/6.x?topic=concepts-components-url path = the specific resource in the host

@ronag
Copy link
Member

ronag commented Aug 27, 2024

as different hosts could provide unrelated responses with the same path. Am I missing something?

If different host can provide unrelated responses then they should reply with Vary: host.

I thought the ‘vary’ field cannot have a value which includes the host? According to https://httpwg.org/specs/rfc9110.html#field.vary,

That doesn't say anything about not including host header.

@kornelski
Copy link

kornelski commented Aug 27, 2024

Cache is always split per URL, and Vary only adds additional granularity within the single URL.

(Cache is technically also split per method, but PUT/POST invalidates cached GET, so in practice you can just cache GET)

Varying by Host is a basic guarantee since RFC2616, and never needs any additional reinforcement. I'm not sure if it's forbidden in Vary, but it's certainly useless there.

@ronag
Copy link
Member

ronag commented Aug 27, 2024

Cache is always split per URL, and Vary only adds additional granularity within the single URL.

(Cache is technically also split per method, but PUT/POST invalidates cached GET, so in practice you can just cache GET)

Varying by Host is a basic guarantee since RFC2616, and never needs any additional reinforcement. I'm not sure if it's forbidden in Vary, but it's certainly useless there.

Any chance you could refer to the relevant parts in the spec?

@flakey5
Copy link
Member

flakey5 commented Aug 27, 2024

I still didn't get an answer why we need to have a cache key as a string. I think we should keep it maximally flexible and pass in the request options.

+1

That won't work though? You can have different request options that have the same result. That's why the spec says you should set the key to method + path and then use the vary header to determine what in the request headers is relevant.

It allows for each cache store to have its own makeKey (or whatever works best for it). The store gets the full context for a request, can make its own cache key based off of it using what it cares about, and then use it for lookups/puts

@metcoder95
Copy link
Member

metcoder95 commented Aug 27, 2024

Response (corrected) Vary can and should invalidate cache if this does not match a already cached requests, and a revalidation should be triggered if the criteria changes.

This applies if we want to follow the spec explicitly

+1 on a custom way to generate cache key, only if advising that this will cause the cache to deviate from the spec.

@kornelski
Copy link

A request not matching existing Vary doesn't invalidate anything. A response with mismatched Vary invalidates the cache.

A good HTTP cache should have two cache levels:

  1. Lookup by the URL (plus method if you choose to cache non-GET) to find if the URL is varying

  2. If it's varying, then lookup responses cached for this URL by a key derived from the request header values listed in the cached Vary response header.

If the second lookup misses, it doesn't invalidate anything yet, only goes to fetch a response. If the server responds with the same Vary, then just add the response to the second-level cache among others (many responses per URL exist).

Only when the server sends a different Vary header (including lack of one), then discard all existing entries for this URL, and start over with this response.

@flakey5
Copy link
Member

flakey5 commented Sep 4, 2024

Just as an update: planning to have at least a draft pr open by this Friday. Not all of the caching directives or optional behaviors will be supported in it just to limit its scope (required ones will be implemented though ofc). Follow up prs will implement more

@loynoir
Copy link

loynoir commented Sep 5, 2024

Would be nice to see node cache similar to web api cache.

https://developer.mozilla.org/en-US/docs/Web/API/Cache/matchAll

@metcoder95
Copy link
Member

I’d recommend to prioritize first the interceptor’s work, and we can further evaluate how to setup/port Cache as Web API

flakey5 added a commit to flakey5/undici that referenced this issue Sep 7, 2024
Implements bare-bones http caching as per rfc9111

Closes nodejs#3231
Closes nodejs#2760
Closes nodejs#2256
Closes nodejs#1146

Signed-off-by: flakey5 <73616808+flakey5@users.noreply.github.com>
flakey5 added a commit to flakey5/undici that referenced this issue Sep 7, 2024
Implements bare-bones http caching as per rfc9111

Closes nodejs#3231
Closes nodejs#2760
Closes nodejs#2256
Closes nodejs#1146

Signed-off-by: flakey5 <73616808+flakey5@users.noreply.github.com>
flakey5 added a commit to flakey5/undici that referenced this issue Sep 7, 2024
Implements bare-bones http caching as per rfc9111

Closes nodejs#3231
Closes nodejs#2760
Closes nodejs#2256
Closes nodejs#1146

Signed-off-by: flakey5 <73616808+flakey5@users.noreply.github.com>
@flakey5 flakey5 mentioned this issue Sep 7, 2024
7 tasks
flakey5 added a commit to flakey5/undici that referenced this issue Sep 11, 2024
Implements bare-bones http caching as per rfc9111

Closes nodejs#3231
Closes nodejs#2760
Closes nodejs#2256
Closes nodejs#1146

Signed-off-by: flakey5 <73616808+flakey5@users.noreply.github.com>

Co-authored-by: Carlos Fuentes <me@metcoder.dev>
Co-authored-by: Robert Nagy <ronagy@icloud.com>
flakey5 added a commit to flakey5/undici that referenced this issue Sep 12, 2024
Implements bare-bones http caching as per rfc9111

Closes nodejs#3231
Closes nodejs#2760
Closes nodejs#2256
Closes nodejs#1146

Signed-off-by: flakey5 <73616808+flakey5@users.noreply.github.com>

Co-authored-by: Carlos Fuentes <me@metcoder.dev>
Co-authored-by: Robert Nagy <ronagy@icloud.com>

Co-authored-by: Isak Törnros <isak.tornros@hotmail.com>
flakey5 added a commit to flakey5/undici that referenced this issue Sep 14, 2024
Implements bare-bones http caching as per rfc9111

Closes nodejs#3231
Closes nodejs#2760
Closes nodejs#2256
Closes nodejs#1146

Co-authored-by: Carlos Fuentes <me@metcoder.dev>

Co-authored-by: Robert Nagy <ronagy@icloud.com>

Co-authored-by: Isak Törnros <isak.tornros@hotmail.com>

Signed-off-by: flakey5 <73616808+flakey5@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request interceptors Pull requests or issues related to Dispatcher Interceptors
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants