You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, loading visits is one of the bottlenecks in the application.
While v2.2.0 introduced huge performance improvements on visits loading, it's still a bit slow when you are loading millions of visits.
Considering visits will likely never change once processed, adding some caching to visits collections would improve performance even more.
Am attempt was made for v2.2.0, but server-side cashing proved to be tricky (caching individual visits had a big impact on redis with big amounts of visits, and caching http responses required a complex invalidation logic)
Another possible approach would be to add client-side http caching, which would have the benefit of moving the load to clients.
The proposal is to use an ETag approach with no expiration, where we will use a hash of the list visits request params (short code, dates, page and items per page) in order to check if the amount of already located visits had changed. If it has not, we will return a 304 with no content, letting the client use the cached version.
If it has changed, we will return the new response normally.
As visits can be enforced to be relocated, we can maybe also include the biggest visit location id in the hash.
EDIT
Calculating a hash based on the query params is pretty fast, and reduces the load time dramatically. However, in order to invalidate the cache when new visits occur, the hash must be calculated including also the amount of visits that would have been loaded for the subset and the max ID of the visit locations (as existing visits can be recalculated).
However, getting those two takes too much time, which invalidates the purpose of caching itself, and also adds too much overhead to requests where there's no cache yet.
[2020-09-17] EDIT
A way to simplify this would be to never cache when the amount of results is smaller than requested limit. That way, when the amount of results matches requested items per page, we can asume the elements for requested resulset will no longer change.
That way, we always load fresh data for the "last page", while previous ones which are complete, are always loaded from cache.
Still to be decided if this should be cached client-side with HTTP cache or server-side in redis.
The text was updated successfully, but these errors were encountered:
Currently, loading visits is one of the bottlenecks in the application.
While v2.2.0 introduced huge performance improvements on visits loading, it's still a bit slow when you are loading millions of visits.
Considering visits will likely never change once processed, adding some caching to visits collections would improve performance even more.
Am attempt was made for v2.2.0, but server-side cashing proved to be tricky (caching individual visits had a big impact on redis with big amounts of visits, and caching http responses required a complex invalidation logic)
Another possible approach would be to add client-side http caching, which would have the benefit of moving the load to clients.
The proposal is to use an ETag approach with no expiration, where we will use a hash of the list visits request params (short code, dates, page and items per page) in order to check if the amount of already located visits had changed. If it has not, we will return a 304 with no content, letting the client use the cached version.
If it has changed, we will return the new response normally.
As visits can be enforced to be relocated, we can maybe also include the biggest visit location id in the hash.
EDIT
Calculating a hash based on the query params is pretty fast, and reduces the load time dramatically. However, in order to invalidate the cache when new visits occur, the hash must be calculated including also the amount of visits that would have been loaded for the subset and the max ID of the visit locations (as existing visits can be recalculated).
However, getting those two takes too much time, which invalidates the purpose of caching itself, and also adds too much overhead to requests where there's no cache yet.
[2020-09-17] EDIT
A way to simplify this would be to never cache when the amount of results is smaller than requested limit. That way, when the amount of results matches requested items per page, we can asume the elements for requested resulset will no longer change.
That way, we always load fresh data for the "last page", while previous ones which are complete, are always loaded from cache.
Still to be decided if this should be cached client-side with HTTP cache or server-side in redis.
The text was updated successfully, but these errors were encountered: