[Feature] Add a `CachedReqwestProvider` to cache RPC requests using a `ReqwestProvider` #770

puma314 · 2024-05-22T18:00:47Z

Component

provider, pubsub

Describe the feature you would like

For use-cases like SP1-Reth or Kona, we often want to execute a (historical) block, but we don't have the entire state in memory and we execute this block with a ProviderDb that fetches accounts, storage, etc. using an RPC. Fetching from the network is slow and often takes minutes for all of the accesses required for an entire block.

Often we re-run these blocks to debug things or tune performance, etc. and each time the feedback loop on iteration is very slow because it requires waiting for all the network requests each time. It would be nice to add a very simple caching layer on top of ReqwestProvider that can cache the results of RPC calls to a file (or some other easy to set up format) and then first check the cache before sending a network request.

This would speed up iteration time for use-cases like Kona and SP1-Reth tremendously.

An interface like this might make sense:

let provider = ReqwestProvider::new_http(rpc_url).cache("my_file.txt")

In our case, we are usually querying old blocks (not near the tip of the chain), so re-org awareness is not important for our use-case. We just want a really simple caching layer.

Additional context

No response

The text was updated successfully, but these errors were encountered:

gakonst · 2024-05-22T22:28:17Z

Could this be a tower layer?

Seeing https://docs.rs/tower/latest/tower/ready_cache/cache/struct.ReadyCache.html - cc @mattsse does this work?

puma314 · 2024-05-23T01:56:42Z

I'm not fully sure I understand how the tower works, but noting that we'd want to save stuff to a file so its persisted across instantiations (and not just have the cache in memory as an example).

prestwich · 2024-05-23T06:20:32Z

We won't add caching at the Transport layer via tower because caching (unlike rate limiting or retrying) needs to be aware of the RPC semantics and potentially the provider heartbeat task, so that it can invalidate caches on new blocks and reorgs. This means we need it to be a provider alloy_provider::Layer producing CachingProvider<P, T, N>, rather than a tower::Layer producing CachingTransport<T>.

This is blocked by #736 (which is pretty straightforward to resolve)

Is the use case here making a high volume of requests against specific deep historical states? It sounds like you actually don't want to cache to a file. You want an in-memory cache that is persisted to a file when your program stops? I'm in general not in favor of caching to/from a file directly, as responses get invalidated so regularly, fs access degrades perf, and the target user for alloy doesn't have an archive node and doesn't make queries against the deep state. Would it be enough to have the cache internals be (de)serializable and a way to instantiate the cache with data in it?

gakonst · 2024-05-23T19:28:21Z

This means we need it to be a provider alloy_provider::Layer producing CachingProvider<P, T, N>, rather than a tower::Layer producing CachingTransport.

Good point, supportive.

It sounds like you actually don't want to cache to a file. You want an in-memory cache that is persisted to a file when your program stops?

@puma314 basically this means:

first run you start with no cache file on disk
first request goes to RPC, gets cached
second request goes to the cache
when you ctrl +c the cache's drop impl gets called, persisting everything to disk
when you start up the process again, the entire file is loaded in memory OR the data is "just in time" loaded from the file, either would work i think

puma314 · 2024-05-23T21:54:08Z

Yup that sounds great. @prestwich our use-case is that we are querying getProof and getStorage on blocks potentially hours, etc. in the past (so blocks that are well past the reorg window). We are using this for generating a ZKP, so we wouldn't want to generate a ZKP of a block that could be re-orged, if that makes sense.

@gakonst's proposed suggestion looks great to me as a potential devex.

prestwich · 2024-05-23T22:11:27Z

when you ctrl +c the cache's drop impl gets called, persisting everything to disk

serialization and fs ops are fallible and cant be reliably used in a Drop. so I wouldnt recommend this approach

More broadly tho, a file system-backed cache of finalized responses is not broadly applicable and requires us to make decisions about the user's fs. I am not in favor of including it in the main alloy crates. A memory cache that can be loaded from fs at runtime and serialized to fs on demand is applicable to a lot of users, and could be in the main provider crate. Would that fit your need?

Assuming you're running your own infra, the need may also be better served by accessing reth db or staticfiles directly? If running alongside reth, retrieving proofs and then storing them to the file system is duplicating data that's already in the file system, no?

gakonst · 2024-05-23T23:16:05Z

serialization and fs ops are fallible and cant be reliably used in a Drop. so I wouldnt recommend this approach
More broadly tho, a file system-backed cache of finalized responses is not broadly applicable and requires us to make decisions about the user's fs. I am not in favor of including it in the main alloy crates.

I've used this method before multiple times for debugging (e.g in MEV Inspect) and it's generally been fine, so I personally don't worry about the fallibility, but OK with doing this as a separate crate.

A memory cache that can be loaded from fs at runtime and serialized to fs on demand is applicable to a lot of users, and could be in the main provider crate. Would that fit your need?

How should the cache be populated in this case? Still via ProviderLayer where each method populates an LRU of the data on cache miss? And is it responsibility of the user to flush the cache to disk?

Assuming you're running your own infra, the need may also be better served by accessing reth db or staticfiles directly? If running alongside reth, retrieving proofs and then storing them to the file system is duplicating data that's already in the file system, no?

Proofs aren't part of the Reth DB, they get generated on the fly, don't think this would work

puma314 · 2024-05-23T23:43:37Z

A memory cache that can be loaded from fs and saved to fs would work for me. I'm not running my own infra in this case--the point is that for basically any chain we can get all the storage slots & proofs for running a block in a zkVM, without the need to have a local node running that is synced for that chain. It's a lot lower friction if we can just plug in an RPC vs. having to sync a reth instance. (Also I'm not sure if reth has getProof implemented yet).

let mut cache = MemoryCache::load("file.txt");
let provider = RequestProvider.(...).with_cache(cache);
// do stuff with provider
cache.save("file.txt")

seems totally fine to me.

gakonst · 2024-05-24T01:05:14Z

SG re: the API above! Confirming that if you do stuff with provider that hit the actual backend and not the cache, the new file.txt should 1) include all the requests which were not cached before, 2) all the previous contents of the cache?

eth_getProof is implemented in Reth, but not the historical variant for arbitrary lookback due to limitations of the Erigon DB design which we inherit.

prestwich · 2024-05-24T05:35:41Z

I've used this method before multiple times for debugging (e.g in MEV Inspect) and it's generally been fine, so I personally don't worry about the fallibility, but OK with doing this as a separate crate.

Panics in drops cause aborts, so you can do it, but it's not a decision we want to make on behalf of all users, as we don't know what conditions they're running in

A memory cache that can be loaded from fs and saved to fs would work for me. I'm not running my own infra in this case--the point is that for basically any chain we can get all the storage slots & proofs for running a block in a zkVM, without the need to have a local node running that is synced for that chain. It's a lot lower friction if we can just plug in an RPC vs. having to sync a reth instance. (Also I'm not sure if reth has getProof implemented yet).
let mut cache = MemoryCache::load("file.txt");
let provider = RequestProvider.(...).with_cache(cache);
// do stuff with provider
cache.save("file.txt")
seems totally fine to me.

instantiation should run through the builder API, so the sketch here is something like:

/// Cache object
struct Cache { ... }
/// Caching configuration object
struct CachingLayer { cache: Option<Cache> // other fields? }
/// Provider with cache
struct CachingProvider<P,N,T> { inner: P, cache: Cache }

let provider = builder.layer(CachingLayer::from_file("file.txt")?).http(url)

do you have a ballpark for number of proofs/etc you intend to cache?

puma314 · 2024-05-28T20:52:00Z

I think we would need low 100s of proofs per block, since it's all accounts/state that was touched during a block.

prestwich · 2024-06-03T08:43:28Z

so i think actionable steps for implementing this are:

continue feature: ProviderCall #788
Develop cache invalidation policies
- make a chart of each rpc endpoint with its safe caching point
- e.g. "chainId" is safe to cache at "latest" while "getProof" is safe to tag at "final" or older
- how is caching handled for block num/hash?
- does the caching provider make an extra request to determine whether the block is sufficiently old to cache the response?

puma314 added the enhancement New feature or request label May 22, 2024

yash-atreya mentioned this issue Jun 21, 2024

feat(provider): LRUCache Layer #954

Merged

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Add a `CachedReqwestProvider` to cache RPC requests using a `ReqwestProvider` #770

[Feature] Add a `CachedReqwestProvider` to cache RPC requests using a `ReqwestProvider` #770

puma314 commented May 22, 2024

gakonst commented May 22, 2024

puma314 commented May 23, 2024

prestwich commented May 23, 2024

gakonst commented May 23, 2024

puma314 commented May 23, 2024

prestwich commented May 23, 2024

gakonst commented May 23, 2024 •

edited

Loading

puma314 commented May 23, 2024

gakonst commented May 24, 2024 •

edited

Loading

prestwich commented May 24, 2024

puma314 commented May 28, 2024

prestwich commented Jun 3, 2024

[Feature] Add a CachedReqwestProvider to cache RPC requests using a ReqwestProvider #770

[Feature] Add a CachedReqwestProvider to cache RPC requests using a ReqwestProvider #770

Comments

puma314 commented May 22, 2024

Component

Describe the feature you would like

Additional context

gakonst commented May 22, 2024

puma314 commented May 23, 2024

prestwich commented May 23, 2024

gakonst commented May 23, 2024

puma314 commented May 23, 2024

prestwich commented May 23, 2024

gakonst commented May 23, 2024 • edited Loading

puma314 commented May 23, 2024

gakonst commented May 24, 2024 • edited Loading

prestwich commented May 24, 2024

puma314 commented May 28, 2024

prestwich commented Jun 3, 2024

[Feature] Add a `CachedReqwestProvider` to cache RPC requests using a `ReqwestProvider` #770

[Feature] Add a `CachedReqwestProvider` to cache RPC requests using a `ReqwestProvider` #770

gakonst commented May 23, 2024 •

edited

Loading

gakonst commented May 24, 2024 •

edited

Loading