Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add caching to visits endpoints #780

Open
acelaya opened this issue Jun 7, 2020 · 0 comments
Open

Add caching to visits endpoints #780

acelaya opened this issue Jun 7, 2020 · 0 comments
Labels
blocked Issues with some external dependency preventing to work on them enhancement

Comments

@acelaya
Copy link
Member

acelaya commented Jun 7, 2020

Currently, loading visits is one of the bottlenecks in the application.

While v2.2.0 introduced huge performance improvements on visits loading, it's still a bit slow when you are loading millions of visits.

Considering visits will likely never change once processed, adding some caching to visits collections would improve performance even more.

Am attempt was made for v2.2.0, but server-side cashing proved to be tricky (caching individual visits had a big impact on redis with big amounts of visits, and caching http responses required a complex invalidation logic)

Another possible approach would be to add client-side http caching, which would have the benefit of moving the load to clients.

The proposal is to use an ETag approach with no expiration, where we will use a hash of the list visits request params (short code, dates, page and items per page) in order to check if the amount of already located visits had changed. If it has not, we will return a 304 with no content, letting the client use the cached version.

If it has changed, we will return the new response normally.

As visits can be enforced to be relocated, we can maybe also include the biggest visit location id in the hash.


EDIT

Calculating a hash based on the query params is pretty fast, and reduces the load time dramatically. However, in order to invalidate the cache when new visits occur, the hash must be calculated including also the amount of visits that would have been loaded for the subset and the max ID of the visit locations (as existing visits can be recalculated).

However, getting those two takes too much time, which invalidates the purpose of caching itself, and also adds too much overhead to requests where there's no cache yet.


[2020-09-17] EDIT

A way to simplify this would be to never cache when the amount of results is smaller than requested limit. That way, when the amount of results matches requested items per page, we can asume the elements for requested resulset will no longer change.

That way, we always load fresh data for the "last page", while previous ones which are complete, are always loaded from cache.

Still to be decided if this should be cached client-side with HTTP cache or server-side in redis.

@acelaya acelaya added this to the 2.3.0 milestone Jun 7, 2020
@acelaya acelaya added the blocked Issues with some external dependency preventing to work on them label Jun 11, 2020
@acelaya acelaya removed this from the 2.3.0 milestone Jun 11, 2020
@acelaya acelaya changed the title Add http caching to visits endpoints Add caching to visits endpoints Sep 17, 2020
@acelaya acelaya added this to Shlink Sep 25, 2022
@acelaya acelaya moved this to Todo 🗒️ in Shlink Sep 25, 2022
@acelaya acelaya removed the status in Shlink Sep 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked Issues with some external dependency preventing to work on them enhancement
Projects
Status: No status
Development

No branches or pull requests

1 participant