Expose phrase-prefix queries via the built-in query parser #2044

adamreichold · 2023-05-15T17:26:12Z

This proposes the less-than-imaginative syntax field:"phrase ter"* to perform a phrase prefix query against field using phrase and ter as the terms. The aim of this is to make this type of query more discoverable and simplify manual testing.

I did consider exposing the max_expansions parameter similar to how slop is handled, but I think that this is rather something that should be configured via the querser parser (similar to set_field_boost and set_field_fuzzy) as choosing it requires rather intimiate knowledge of the backing index.

fulmicoton · 2023-05-18T01:22:16Z

@adamreichold is this something that you need?
(this does seem useful...)

adamreichold · 2023-05-18T06:23:34Z

@adamreichold is this something that you need? (this does seem useful...)

I would say that it is something that I want: I have an internal UI for testing purposes which just uses the built-in query parser. Hence, if we want to try out some functionality before we commit to integrating it, it is easiest to expose it in the built-in query parser so the data people can play with it.

I also thought that it is a rather straight-forward extension of the existing support for phrase queries and slop that does not really introduce anything new conceptually. So the maintenance effort should be limited making this a reasonable trade-off?

codecov-commenter · 2023-05-19T06:55:48Z

Codecov Report

Merging #2044 (557af61) into main (62709b8) will increase coverage by 0.00%.
The diff coverage is 100.00%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

@@           Coverage Diff           @@
##             main    #2044   +/-   ##
=======================================
  Coverage   94.43%   94.43%           
=======================================
  Files         319      319           
  Lines       59689    59750   +61     
=======================================
+ Hits        56365    56425   +60     
- Misses       3324     3325    +1

Impacted Files	Coverage Δ
query-grammar/src/query_grammar.rs	`99.73% <100.00%> (+<0.01%)`	⬆️
query-grammar/src/user_input_ast.rs	`97.29% <100.00%> (+0.04%)`	⬆️
src/query/query_parser/logical_ast.rs	`82.81% <100.00%> (+1.45%)`	⬆️
src/query/query_parser/query_parser.rs	`94.77% <100.00%> (+0.16%)`	⬆️

... and 3 files with indirect coverage changes

fulmicoton · 2023-05-20T09:37:30Z

@adamreichold that makes sense... Let's see if we can slip that into the next tantivy release.

fulmicoton · 2023-05-24T07:00:43Z

@PSeitz can you take this review?

PSeitz · 2023-05-24T09:01:38Z

@adamreichold Can you add a test with data?

What happens for "wo"*

adamreichold · 2023-05-24T09:53:02Z

Can you add a test with data?

I believe those would not be added to the query parser tests but rather to the implementation of phrase-prefix weight (as with the other query types). But that weight already has tests using an index with data. Is there anything specific missing from those tests that should be added?

What happens for "wo"*

Nice catch! I added an explicit error for that case as phrase-prefix queries only really make sense with at least two terms (in contrast to phrase queries which sensibly "downgrade" to term queries).

adamreichold · 2023-05-29T14:54:14Z

I believe those would not be added to the query parser tests but rather to the implementation of phrase-prefix weight (as with the other query types). But that weight already has tests using an index with data. Is there anything specific missing from those tests that should be added?

@PSeitz Friendly ping on the above question.

PSeitz · 2023-05-31T03:12:31Z

Can you add a test with data?

I believe those would not be added to the query parser tests but rather to the implementation of phrase-prefix weight (as with the other query types). But that weight already has tests using an index with data. Is there anything specific missing from those tests that should be added?

I was thinking of an end to end test, that makes sure everything is correctly connected. It's also a form of documentation and helps discovering features

This proposes the less-than-imaginative syntax `field:"phrase ter"*` to perform a phrase prefix query against `field` using `phrase` and `ter` as the terms. The aim of this is to make this type of query more discoverable and simplify manual testing. I did consider exposing the `max_expansions` parameter similar to how slop is handled, but I think that this is rather something that should be configured via the querser parser (similar to `set_field_boost` and `set_field_fuzzy`) as choosing it requires rather intimiate knowledge of the backing index.

adamreichold · 2023-05-31T05:47:18Z

I was thinking of an end to end test, that makes sure everything is correctly connected. It's also a form of documentation and helps discovering features

With this motivation in mind, I added a dedicated example as the discoverability of unit tests is limited and adding a doc test to QueryParser, but only for phrase-prefix queries would most likely confuse people reading that comment for the first time.

src/query/query_parser/query_parser.rs

…e query parser.

…ture discoverability.

adamreichold · 2023-05-31T14:11:33Z

I replaced the variable name option by indexing_options as suggested in both cases.

PSeitz

LGTM Thanks!

fulmicoton requested review from fulmicoton and PSeitz May 20, 2023 09:37

PSeitz reviewed May 31, 2023

View reviewed changes

src/query/query_parser/query_parser.rs Outdated Show resolved Hide resolved

PSeitz reviewed May 31, 2023

View reviewed changes

src/query/query_parser/query_parser.rs Outdated Show resolved Hide resolved

adamreichold added 2 commits May 31, 2023 16:10

Prevent construction of zero or one term phrase-prefix queries via th…

505c432

…e query parser.

Add example using phrase-prefix search via surface API to improve fea…

74b7470

…ture discoverability.

PSeitz approved these changes Jun 1, 2023

View reviewed changes

PSeitz merged commit b325d56 into quickwit-oss:main Jun 1, 2023

adamreichold deleted the query-parser-phrase-prefix branch June 1, 2023 11:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose phrase-prefix queries via the built-in query parser #2044

Expose phrase-prefix queries via the built-in query parser #2044

adamreichold commented May 15, 2023

fulmicoton commented May 18, 2023 •

edited

Loading

adamreichold commented May 18, 2023

codecov-commenter commented May 19, 2023

fulmicoton commented May 20, 2023

fulmicoton commented May 24, 2023

PSeitz commented May 24, 2023

adamreichold commented May 24, 2023 •

edited

Loading

adamreichold commented May 29, 2023

PSeitz commented May 31, 2023

adamreichold commented May 31, 2023

adamreichold commented May 31, 2023 •

edited

Loading

PSeitz left a comment

Expose phrase-prefix queries via the built-in query parser #2044

Expose phrase-prefix queries via the built-in query parser #2044

Conversation

adamreichold commented May 15, 2023

fulmicoton commented May 18, 2023 • edited Loading

adamreichold commented May 18, 2023

codecov-commenter commented May 19, 2023

Codecov Report

fulmicoton commented May 20, 2023

fulmicoton commented May 24, 2023

PSeitz commented May 24, 2023

adamreichold commented May 24, 2023 • edited Loading

adamreichold commented May 29, 2023

PSeitz commented May 31, 2023

adamreichold commented May 31, 2023

adamreichold commented May 31, 2023 • edited Loading

PSeitz left a comment

Choose a reason for hiding this comment

fulmicoton commented May 18, 2023 •

edited

Loading

adamreichold commented May 24, 2023 •

edited

Loading

adamreichold commented May 31, 2023 •

edited

Loading