-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add intervals query #36135
Add intervals query #36135
Conversation
Pinging @elastic/es-search |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change looks good overall @romseygeek , I left some comments.
@@ -0,0 +1,58 @@ | |||
setup: | |||
- skip: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this working ? I was not aware that the skip
section can be set in the setup
part but if it's working... ;)
if (posAtt.getPositionIncrement() == 1) { | ||
if (synonyms.size() == 1) { | ||
terms.add(synonyms.get(0)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can you add an else
to desambiguate ?
import java.util.Objects; | ||
|
||
import static org.elasticsearch.common.xcontent.ConstructingObjectParser.constructorArg; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add javadocs to explain the kind of queries that this builder handles ?
|
||
import static org.elasticsearch.common.xcontent.ConstructingObjectParser.constructorArg; | ||
import static org.elasticsearch.common.xcontent.ConstructingObjectParser.optionalConstructorArg; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, this is the user API so I'd expect some explanations regarding the different options ?
server/src/main/java/org/elasticsearch/search/SearchModule.java
Outdated
Show resolved
Hide resolved
I've added some better YAML tests and some basic documentation in the query DSL reference. I'm still a bit unsure of the DSL interface and some of the terms I'm using (eg |
rest
Outdated
|
||
./gradlew :distribution:archives:integ-test-zip:integTest \ | ||
-Dtests.class="org.elasticsearch.test.rest.*Yaml*IT" \ | ||
-Dtests.method="test {p0=$1}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder what is the role of this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes running rest tests easier, but I didn't mean to commit it :) Will remove
After conferring with @clintongormley I've reworked the API somewhat. My one reservation is the name of the query, which is still |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very clean overall. I'm curious why you decided to support a filter
element in every intervals source instead of eg. having a filter
intervals source?
Regarding the name, I don't dislike "intervals" as this is their name in the paper that introduced them. If we give them a more general name, I'm afraid that users would be even more surprised eg. by the behavior of any_of
as it only returns minimal intervals?
favourite food is porridge` would not match, because the interval matching | ||
`cold porridge` starts before the interval matching `my favourite food`. | ||
|
||
[[intervals-match]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huge +1 to exposing a match and no term
@Override | ||
public IntervalsSource intervals(String text, int maxGaps, boolean ordered, NamedAnalyzer analyzer) throws IOException { | ||
if (indexOptions().compareTo(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS) < 0) { | ||
throw new IllegalArgumentException("Cannot create source against field [" + name() + "] with no positions indexed"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe say "intervals source" instead of just source?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
/** | ||
* Constructs an IntervalsSource based on analyzed text | ||
*/ | ||
public class IntervalBuilder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make it pkg-private?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's called from TextFieldMapper which is in a different package, so unfortunately we can't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we move it to the same package to keep visibility to a minimum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also called from IntervalsSourceProvider, which doesn't really fit in the mapper
package. I think it has to stay public...
default: | ||
provider = IntervalsSourceProvider.fromXContent(parser); | ||
|
||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems that this won't fail if multiple providers are provided?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have updated
MappedFieldType fieldType = context.fieldMapper(field); | ||
if (fieldType == null) { | ||
throw new IllegalArgumentException("Cannot create IntervalQuery over non-existent field [" + field + "]"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in general we are lenient with unmapped fields because of cross-index search, should we be lenient here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have updated
The `or` rule will match any of its nested sub-rules. | ||
|
||
[horizontal] | ||
`intervals`:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we need to add documentation here about the fact that this only returns minimal intervals and consequences?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a section on minimization at the end of the doc, with some examples of queries that can produce surprising results, and how to deal with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome
I've pushed some changes addressing your comments @jpountz. For the filtering, it seemed to make more sense to add it as an option on to each rule, because the intervals produced by a filter all belong to the rule being filtered, and Clint and I thought that this was the best way of making that obvious. |
@elasticmachine retest this please |
1 similar comment
@elasticmachine retest this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @romseygeek. I thought having some sort of filtering intervals would be more consistent with how we deal with queries but I don't feel too strongly about it either.
LGTM
/** | ||
* Constructs an IntervalsSource based on analyzed text | ||
*/ | ||
public class IntervalBuilder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we move it to the same package to keep visibility to a minimum?
The `or` rule will match any of its nested sub-rules. | ||
|
||
[horizontal] | ||
`intervals`:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome
@jpountz I didn't have strong reasons for adding filtering the way it is here. Happy to discuss if you think it makes sense doing it in a different way. |
This commit exposes the lucene intervals query in elasticsearch, as a replacement for the
Span query family. Instead of building query structures up from individual terms, the intervals query
uses a tree of sources, the leaves of which are formed of
match
-type queries that are passedthrough analysis. These can then be combined using
or
,combine
orrelate
sources to testfor proximity or position.
Replaces #32406.
Closes #29636