Add query matching terms in a set #1539

trinity-1686a · 2022-09-21T14:54:58Z

fix the 2nd half of #1494

adamreichold · 2022-09-23T09:20:14Z

src/query/set_query.rs

+
+impl TermSetQuery {
+    /// Create a Term Set Query
+    pub fn new(field: Field, terms: BTreeSet<Term>) -> Self {


BTreeSet is a good data structure for a set that is continuously changing while always staying sorted and using the sorting to speed element access. In this case, it seems that the set is accessed only once to produce a weight.

Maybe it would be nicer to simply require terms: T where T: IntoIterator<Item=Term>, collect this into a Vec (where terms.into_iter().collect::<Vec<_>>() would not allocate if the iterator is created from a Vec) and sort and deduplicate this Vec once in this constructor?

I think this could yield nicer API and simpler and hence faster code, but then again it could be insignificant and thereby not worth it especially if one expects the terms to be already presented as a BTreeSet.

I like the BTreeSet personally.

I like the BTreeSet personally.

I would not say that I dislike it (or any other data structure for that matter), just that it brings more to the table than is required here. I do think it is arguably more complex than say Vec which is why I tried to suggest the simplest possible data structure for this particular task.

What should I do then? Personally I prefer using a BTreeSet, but I understand an IntoIterator is easier to provide in general

As a potential user of this query*, if the type was a BTreeSet we would have to create one just for this purpose, and the query just needs immutable sorted & deduped data for which Vec suffices. I would vote for IntoIterator.

* currently doing union boolean query over tens of IDs -- looks like this should be much more efficient!

IntoIterator it is then...

and then within the function:
IntoIterator -> HashSet -> Vec -> Sort.

codecov-commenter · 2022-09-24T08:53:56Z

Codecov Report

Merging #1539 (9648495) into main (d641979) will decrease coverage by 0.08%.
The diff coverage is 95.91%.

@@            Coverage Diff             @@
##             main    #1539      +/-   ##
==========================================
- Coverage   93.92%   93.83%   -0.09%     
==========================================
  Files         249      251       +2     
  Lines       45903    46264     +361     
==========================================
+ Hits        43114    43414     +300     
- Misses       2789     2850      +61

Impacted Files	Coverage Δ
src/query/mod.rs	`100.00% <ø> (ø)`
src/query/set_query.rs	`95.91% <95.91%> (ø)`
src/fastfield/bytes/reader.rs	`70.58% <0.00%> (-6.84%)`	⬇️
src/indexer/delete_queue.rs	`94.76% <0.00%> (-3.46%)`	⬇️
src/fastfield/multivalued/reader.rs	`91.07% <0.00%> (-2.51%)`	⬇️
src/directory/mmap_directory.rs	`90.37% <0.00%> (-0.71%)`	⬇️
common/src/serialize.rs	`85.89% <0.00%> (-0.09%)`	⬇️
common/src/bitset.rs	`98.48% <0.00%> (-0.02%)`	⬇️
src/schema/mod.rs	`100.00% <0.00%> (ø)`
src/fastfield/mod.rs	`99.71% <0.00%> (ø)`
... and 11 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

src/query/set_query.rs

fulmicoton

pleae have a look at the clippy stuff too.

src/query/set_query.rs

support using different fields use less and_then and more if-let

trinity-1686a · 2022-09-27T10:05:24Z

Should I depreciate BooleanQuery::new_multiterms_query? It does the same thing, but is probably slower as soon as there is more than one term per field being searched

fulmicoton · 2022-09-27T12:35:26Z

src/query/set_query.rs

+impl TermSetQuery {
+    /// Create a Term Set Query
+    pub fn new<T: IntoIterator<Item = Term>>(terms: T) -> Self {
+        let mut terms_map: HashMap<_, Vec<_>> = terms


let mut terms_map = HashMap::default();
for term in terms {
terms_map.entry(field).or_default().push(term);
}

is much easier to read IMHO

src/query/set_query.rs

fulmicoton

Great job!

trinity-1686a added 2 commits September 21, 2022 14:09

imlement TermSetQuery

48b92cc

add test for TermSetQuery

c4ae7e9

adamreichold reviewed Sep 23, 2022

View reviewed changes

trinity-1686a added 2 commits September 23, 2022 16:12

accept IntoIterator instead of just BTreeSet

bb5d87d

fix formating

dac6ddb