-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor QueryCache to improve concurrency and performance #14222
Comments
I almost have a prototype ready for above solution, I will add benchmarking results here comparing performance. |
Sounds interesting, keen to see if we measure a performance improvement! |
I got busy with other stuff but got sometime to run initial benchmark for this. I essentially micro-benchmarked I created a LRUQueryCacheV2, with things recommended above. It creates 16(for this test) QueryCacheSegments with each having its own in-memory map to store composite key and value. Composite key is nothing but a combination of Some parts of eviction logic is yet to be fully written for V2, like clearing entires when a lucene segment is merged etc. Also ran existing UT for QueryCache on top of LRUQueryCacheV2 for high level correctness, some 16 are passing and 10 failing(basically due to incomplete eviction logic). Coming to results: Here v1 refers to existing QueryCache and v2 refers to my version of QueryCache. Performance Comparison: v1 vs. v2
Raw results:
I only assumed 16 lucene segments for this test which is less for a OpenSearch node with multiple indices. With more, we will see more improvements. Also eviction wrt segment merges will be handled on a separate thread for v2 which is unaccounted for, but even with that, it should be highly performant. |
I re-ran test with 1mb cache, and assuming 50 lucene segments Performance Comparison: v1 vs v2
|
This shows a nice improvement on the microbenchmark! But in a typical workload we expect to be spending most of our time executing queries rather than caching them, which will reduce the amount of time spent acquiring locks, and the contention, therefore the gains could become negligible -- or not. Is it possible in the benchmarking framework to simulate a more realistic workload? |
It depends on the workload. Due to the query cache's caching policy, not all queries will be cached. However, with a workload that heavily utilizes the query cache, we can still expect overall benefits—maybe not as pronounced as the microbenchmark results above. These results suggest that the cache does not have to be a bottleneck itself.
Yes. I am trying that as well. Do you have any recommendations, such as any existing benchmarks I could use for this use case? |
Have you looked at luceneutil? |
I did. But I was not sure whether it works with the cache enabled? As with most benchmarks, caches are usually disabled to ensure accurate results. |
I had a quick look and there are a lot of lines like |
Ah thanks for this! I will try uncommenting that line and then running the benchmarks. |
Description
Given the significant role of LRUQueryCache in Lucene, I see opportunities to enhance its performance. Although there have been many discussions on this topic like here, they haven’t led to anywhere.
Currently, QueryCache uses a
Map<CacheKey, LeafCache>
, where:Current Issue:
We use a global read/write lock to put items into the cache. This allows only one writer at a time holding other writers or other readers.
Even if we were to introduce multiple read/write locks(via multiple partitions), it won't help with
CacheKey
being our key. As if there are only 5 segments but 10,000 unique queries, we end up with just 5 unique cache keys. This means a large number of requests compete for the same write lock, significantly degrading performance by either waiting or not using cache (by skipping it if lock not obtained).One simple Solution:
Introduce a Composite Key
Map<CompositeKey, CacheCount>
Partition the Cache for Better Concurrency
hash(CacheKey, Query) % partition_count
(as an example)I also see we use uniqueQueries/mostRecentlyUsedQueries here, to maintain a LRU list for the queries. I think we can remove it and instead maintain a separate LRU list per partition, using CompositeKey.
For Eviction & Cleanup, we can collect stale
CacheKey
entries in aSet<CacheKey>
when segment is closed due to merge. A background thread later iterates over cache entries and clears them, similar to how RequestCache in OpenSearch works.Let me know what you think. I think it might work pretty well(and simple enough) but I am open to other ideas.
The text was updated successfully, but these errors were encountered: