-
-
Notifications
You must be signed in to change notification settings - Fork 672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add query matching terms in a set #1539
Conversation
src/query/set_query.rs
Outdated
|
||
impl TermSetQuery { | ||
/// Create a Term Set Query | ||
pub fn new(field: Field, terms: BTreeSet<Term>) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTreeSet
is a good data structure for a set that is continuously changing while always staying sorted and using the sorting to speed element access. In this case, it seems that the set is accessed only once to produce a weight.
Maybe it would be nicer to simply require terms: T where T: IntoIterator<Item=Term>
, collect this into a Vec
(where terms.into_iter().collect::<Vec<_>>()
would not allocate if the iterator is created from a Vec
) and sort and deduplicate this Vec
once in this constructor?
I think this could yield nicer API and simpler and hence faster code, but then again it could be insignificant and thereby not worth it especially if one expects the terms to be already presented as a BTreeSet
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the BTreeSet personally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the BTreeSet personally.
I would not say that I dislike it (or any other data structure for that matter), just that it brings more to the table than is required here. I do think it is arguably more complex than say Vec
which is why I tried to suggest the simplest possible data structure for this particular task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should I do then? Personally I prefer using a BTreeSet
, but I understand an IntoIterator
is easier to provide in general
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a potential user of this query*
, if the type was a BTreeSet
we would have to create one just for this purpose, and the query just needs immutable sorted & deduped data for which Vec
suffices. I would vote for IntoIterator
.
*
currently doing union boolean query over tens of IDs -- looks like this should be much more efficient!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IntoIterator it is then...
and then within the function:
IntoIterator -> HashSet -> Vec -> Sort.
Codecov Report
@@ Coverage Diff @@
## main #1539 +/- ##
==========================================
- Coverage 93.92% 93.83% -0.09%
==========================================
Files 249 251 +2
Lines 45903 46264 +361
==========================================
+ Hits 43114 43414 +300
- Misses 2789 2850 +61
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pleae have a look at the clippy stuff too.
support using different fields use less and_then and more if-let
Should I depreciate |
src/query/set_query.rs
Outdated
impl TermSetQuery { | ||
/// Create a Term Set Query | ||
pub fn new<T: IntoIterator<Item = Term>>(terms: T) -> Self { | ||
let mut terms_map: HashMap<_, Vec<_>> = terms |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let mut terms_map = HashMap::default();
for term in terms {
terms_map.entry(field).or_default().push(term);
}
is much easier to read IMHO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job!
fix the 2nd half of #1494