Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement: Store Deserialized HttpResponse for Improved Performance #38

Open
tusharmath opened this issue Aug 22, 2023 · 11 comments
Open

Comments

@tusharmath
Copy link

Enhancement: Store Deserialized HttpResponse for Improved Performance

To enhance the performance of the HTTP cache library, consider adding the capability to store the deserialized version of the HttpResponse. This would help developers in avoiding repeated deserialization after retrieving from the cache.

  • Current Behavior:

    • When retrieving a cached HttpResponse, developers need to deserialize the response every time.
    • This leads to repetitive and unnecessary processing, especially for frequently accessed responses.
  • Proposed Enhancement:

    • Alongside the serialized response, store a deserialized version.
    • When the cache is accessed, provide an option to retrieve the already deserialized HttpResponse.

Benefits:

  • Performance: Faster retrieval times for cached responses, especially for applications where deserialization time is significant.
  • Ease of Use: Developers can bypass the deserialization step, simplifying their code.

Considerations:

  • Memory Usage: Storing deserialized versions might increase memory usage. This needs to be benchmarked and considered during implementation.
  • Cache Invalidation: Ensure that both serialized and deserialized versions are invalidated concurrently to avoid data mismatches.

Let me know what you think?

@06chaynes
Copy link
Owner

06chaynes commented Aug 23, 2023

I would be interested in seeing some benchmarks with how things are now as well as when storing the deserialized response. When I have some time I can look into adding those tests, unless that's something you would feel comfortable in submitting.

@tusharmath
Copy link
Author

tusharmath commented Aug 24, 2023

Happy to share some internal insights on this — In our benchmarking, if we use a custom cache, that caches deserialized responses, we easily get around 30% improvement in performance whenever there is a cache hit scenario. For context, we are building a graphql-proxy layer that hits this API https://jsonplaceholder.typicode.com/posts.

This is a simple API, and not a very heavy one. The penalty of not having such caching increases linearly with increase in payload size.

Search for deserialize in the below flamegraph.
flamegraph

BTW - This is a fantastic library and we love how performant it is in it's default tuning.

@06chaynes
Copy link
Owner

Thanks, glad you found it of some use!

Thanks for sharing your findings, and yeah that is a fairly large increase in performance. I will hopefully have some time soon(ish) to look into this further.

@06chaynes
Copy link
Owner

06chaynes commented Aug 25, 2023

@tusharmath So I was considering ways this might be accomplished and I think one possible route is by implementing a custom CacheManager that doesn't serialize things before storing them (and ofc wouldn't need to deserialize it coming out).

Playing around I did create a version of this for Moka that at least works in the test I wrote for it. MokaManagerDeserialized

I don't have any benchmarks written so I don't know how this compares to what exists but if you'd like to try it out feel free to see if you notice any improvements.

Edit: Where you would use MokaManager::default() you would use MokaManagerDeserialized::default().

@06chaynes
Copy link
Owner

You could add the manager implementation to your app as well, I think this should work:

use http_cache::{CacheManager, HttpResponse, Result};

use std::{fmt, sync::Arc};

use http_cache_semantics::CachePolicy;
use moka::future::{Cache, ConcurrentCacheExt};

#[derive(Clone)]
pub struct MokaManagerDeserialized {
    pub cache: Arc<Cache<String, Store>>,
}

impl fmt::Debug for MokaManagerDeserialized {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        f.debug_struct("MokaManagerDeserialized").finish_non_exhaustive()
    }
}

impl Default for MokaManagerDeserialized {
    fn default() -> Self {
        Self::new(Cache::new(42))
    }
}

#[derive(Clone, Debug)]
pub struct Store {
    response: HttpResponse,
    policy: CachePolicy,
}

impl MokaManagerDeserialized {
    pub fn new(cache: Cache<String, Store>) -> Self {
        Self { cache: Arc::new(cache) }
    }
    pub async fn clear(&self) -> Result<()> {
        self.cache.invalidate_all();
        self.cache.sync();
        Ok(())
    }
}

#[async_trait::async_trait]
impl CacheManager for MokaManagerDeserialized {
    async fn get(
        &self,
        cache_key: &str,
    ) -> Result<Option<(HttpResponse, CachePolicy)>> {
        let store: Store = match self.cache.get(cache_key) {
            Some(d) => d,
            None => return Ok(None),
        };
        Ok(Some((store.response, store.policy)))
    }

    async fn put(
        &self,
        cache_key: String,
        response: HttpResponse,
        policy: CachePolicy,
    ) -> Result<HttpResponse> {
        let store = Store { response: response.clone(), policy };
        self.cache.insert(cache_key, store).await;
        self.cache.sync();
        Ok(response)
    }

    async fn delete(&self, cache_key: &str) -> Result<()> {
        self.cache.invalidate(cache_key).await;
        self.cache.sync();
        Ok(())
    }
}

@06chaynes
Copy link
Owner

@tusharmath went ahead and made this its own crate for now. Will publish the first release soon.

You should be able to use it as a drop in replacement for the moka manager that comes with the core library.

http-cache-reqwest = { version = "0.11.2", default-features = false }
http-cache-mokadeser = "0.1.0"
use http_cache_reqwest::{Cache, CacheMode, HttpCache, HttpCacheOptions};
use http_cache_mokadeser::MokaManager;

client = client.with(Cache(HttpCache {
        mode: CacheMode::Default,
        manager: MokaManager::default(),
        options: HttpCacheOptions::default(),
      }))

@tusharmath
Copy link
Author

Thanks @06chaynes ! I will try it and compare performance. However from first impressions we still need to deserialize the response everytime. That is bit expensive. I think what we are saving here is the serialisation costs.

@06chaynes
Copy link
Owner

@tusharmath Sounds good! Though I am curious where you see this occurring, I might be misunderstanding or missing something so just want to be sure.

@tusharmath
Copy link
Author

The Store keeps the complete HttpResponse:

pub struct Store {
    response: HttpResponse,
    policy: CachePolicy,
}

This means when I try to get the HttpResponse, I end up parsing the body again and convert bytes to a serde_json::Value.
This happens over an over again and resulting in degradation in performance. Instead if the store had a capability to keep a pre-deserialized json response it would only deserialize it once and store it in the cache.

@06chaynes
Copy link
Owner

@tusharmath The middleware will need to return the response as an HttpReponse for me to work with, as of right now I don't have a way around this. I'm open to suggestions on this but right now I can't picture a clear path forward for the requested changes.

@06chaynes
Copy link
Owner

Revisiting the internals of the project after some of the latest stabilized updates to the language, and I think one of the changes I could make is getting rid of the HttpResponse struct entirely and therefore the conversion. I don't have a great deal of free time to work on things right now but I should be able to make some progress towards this eventually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants