Use Collector.setWeight
to improve aggregation performance (for special cases)
#10954
Labels
enhancement
Enhancement or improvement to existing feature or request
Search:Performance
v2.13.0
Issues and PRs related to version 2.13.0
v3.0.0
Issues and PRs related to version 3.0.0
Lucene added a new
setWeight
method to theCollector
interface a while back (see https://issues.apache.org/jira/browse/LUCENE-10620), specifically to give collectors access to theWeight.count()
method.Weight.count()
only has a few cases where it returns things other than-1
(the value meaning "I can't give you a cheap count"), but the cases where it does return are pretty useful -- mostly "match all" or "match none", but for a single term query will return "I match exactly this many", if there are no deletions in the current segment (since it just reads the term's doc freq).I believe this can be useful to short-circuit some aggregation logic, since aggregations all extend
Collector
.These are the special cases that I've been able to think of where the
weight.count(leafReaderContext)
could hint at smarter computation of aggregations:weight.count(leafReaderContext) == 0
), then count for every bucket is 0. (If the min count is greater than 0, then you don't need to compute any buckets for this segment.)weight.count(leafReaderContext) == leafReaderContext.reader().maxDoc()
), then the count of hits in a bucket (from the current segment) is determined entirely by the count of the bucket, which may be cheap to compute (e.g. doc freq for a terms aggregation, maybe read count from the BKD tree for a range aggregation).weight.count(leafReaderContext)
.I didn't give it a lot of thought, so there might be some more that I'm missing.
The text was updated successfully, but these errors were encountered: