Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix match_phrase_prefix_query not working on text field with multiple values and index_prefixes #10959

Merged
merged 25 commits into from
Jul 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
c9a27b0
Fix match_phrase_prefix_query not working on text field with multiple…
gaobinlong Oct 23, 2023
428c851
Add more test
gaobinlong Oct 27, 2023
38aca33
modify change log
gaobinlong Oct 27, 2023
7e9261c
Fix test failure
gaobinlong Oct 30, 2023
7c90dda
Merge remote-tracking branch 'upstream/main' into fix_match_phrase_pr…
gaobinlong Oct 30, 2023
dd24556
merge main
gaobinlong Oct 31, 2023
06b7e5b
Merge remote-tracking branch 'upstream/main' into fix_match_phrase_pr…
gaobinlong Nov 2, 2023
2df12f1
Merge remote-tracking branch 'upstream/main' into fix_match_phrase_pr…
gaobinlong Nov 3, 2023
84afd89
Change the indexAnalyzer used by prefix field
gaobinlong Nov 3, 2023
6da2749
Skip old version for yaml test
gaobinlong Nov 6, 2023
1a14caf
Merge remote-tracking branch 'upstream/main' into fix_match_phrase_pr…
gaobinlong Nov 6, 2023
6444490
Merge remote-tracking branch 'upstream/main' into fix_match_phrase_pr…
gaobinlong Nov 15, 2023
1c3d724
Merge remote-tracking branch 'upstream/main' into fix_match_phrase_pr…
gaobinlong Nov 16, 2023
4d4484d
Optimize some code
gaobinlong Nov 16, 2023
763099a
merge main
gaobinlong Jan 4, 2024
8874093
Fix test failure
gaobinlong Jan 5, 2024
251bd44
Merge remote-tracking branch 'upstream/main' into fix_match_phrase_pr…
gaobinlong Jan 5, 2024
91d61f6
Merge remote-tracking branch 'upstream/main' into fix_match_phrase_pr…
gaobinlong Jan 8, 2024
476d2e7
merge main
gaobinlong Apr 15, 2024
349a406
Modify yaml test description
gaobinlong Apr 15, 2024
f510415
merge main
gaobinlong Apr 18, 2024
3c5bd2f
Merge remote-tracking branch 'upstream/main' into fix_match_phrase_pr…
gaobinlong Apr 18, 2024
1567e61
merge main
gaobinlong Jun 19, 2024
c704c97
Remove the name parameter for setAnalyzer()
gaobinlong Jun 19, 2024
1240bc4
merge main
gaobinlong Jun 25, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
- Fix bug in SBP cancellation logic ([#13259](https://github.com/opensearch-project/OpenSearch/pull/13474))
- Fix handling of Short and Byte data types in ScriptProcessor ingest pipeline ([#14379](https://github.com/opensearch-project/OpenSearch/issues/14379))
- Switch to iterative version of WKT format parser ([#14086](https://github.com/opensearch-project/OpenSearch/pull/14086))
- Fix match_phrase_prefix_query not working on text field with multiple values and index_prefixes ([#10959](https://github.com/opensearch-project/OpenSearch/pull/10959))
- Fix the computed max shards of cluster to avoid int overflow ([#14155](https://github.com/opensearch-project/OpenSearch/pull/14155))
- Fixed rest-high-level client searchTemplate & mtermVectors endpoints to have a leading slash ([#14465](https://github.com/opensearch-project/OpenSearch/pull/14465))
- Write shard level metadata blob when snapshotting searchable snapshot indexes ([#13190](https://github.com/opensearch-project/OpenSearch/pull/13190))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,12 @@ setup:
index_prefixes:
min_chars: 2
max_chars: 5

text_with_pos_inc_gap:
type: text
position_increment_gap: 201
index_prefixes:
min_chars: 2
max_chars: 5
- do:
index:
index: test
Expand All @@ -23,6 +28,18 @@ setup:
id: 2
body: { text: sentence with UPPERCASE WORDS }

- do:
index:
index: test
id: 3
body: { text: ["foo", "b-12"] }

- do:
index:
index: test
id: 4
body: { text_with_pos_inc_gap: ["foo", "b-12"] }

- do:
indices.refresh:
index: [test]
Expand Down Expand Up @@ -116,3 +133,36 @@ setup:
]

- match: {hits.total: 1}

# related issue: https://github.com/opensearch-project/OpenSearch/issues/9203
---
"search index prefixes with multiple values":
- skip:
version: " - 2.99.99"
reason: "the bug was fixed in 3.0.0"
- do:
search:
rest_total_hits_as_int: true
index: test
body:
query:
match_phrase_prefix:
text: "b-12"

- match: {hits.total: 1}

---
"search index prefixes with multiple values and custom position_increment_gap":
- skip:
version: " - 2.99.99"
reason: "the bug was fixed in 3.0.0"
- do:
search:
rest_total_hits_as_int: true
index: test
body:
query:
match_phrase_prefix:
text_with_pos_inc_gap: "b-12"

- match: {hits.total: 1}
Original file line number Diff line number Diff line change
Expand Up @@ -448,7 +448,6 @@ protected PrefixFieldMapper buildPrefixMapper(BuilderContext context, FieldType
pft.setStoreTermVectorOffsets(true);
}
PrefixFieldType prefixFieldType = new PrefixFieldType(tft, fullName + "._index_prefix", indexPrefixes.get());
prefixFieldType.setAnalyzer(analyzers.getIndexAnalyzer());
tft.setPrefixFieldType(prefixFieldType);
return new PrefixFieldMapper(pft, prefixFieldType);
}
Expand Down Expand Up @@ -522,19 +521,26 @@ private static class PrefixWrappedAnalyzer extends AnalyzerWrapper {
private final int minChars;
private final int maxChars;
private final Analyzer delegate;
private final int positionIncrementGap;

PrefixWrappedAnalyzer(Analyzer delegate, int minChars, int maxChars) {
PrefixWrappedAnalyzer(Analyzer delegate, int minChars, int maxChars, int positionIncrementGap) {
super(delegate.getReuseStrategy());
this.delegate = delegate;
this.minChars = minChars;
this.maxChars = maxChars;
this.positionIncrementGap = positionIncrementGap;
}

@Override
protected Analyzer getWrappedAnalyzer(String fieldName) {
return delegate;
}

@Override
public int getPositionIncrementGap(String fieldName) {
return positionIncrementGap;
}
msfroh marked this conversation as resolved.
Show resolved Hide resolved

@Override
protected TokenStreamComponents wrapComponents(String fieldName, TokenStreamComponents components) {
TokenFilter filter = new EdgeNGramTokenFilter(components.getTokenStream(), minChars, maxChars, false);
Expand Down Expand Up @@ -588,17 +594,18 @@ static final class PrefixFieldType extends StringFieldType {

final int minChars;
final int maxChars;
final TextFieldType parentField;
final TextFieldType parent;

PrefixFieldType(TextFieldType parentField, String name, PrefixConfig config) {
this(parentField, name, config.minChars, config.maxChars);
}

PrefixFieldType(TextFieldType parentField, String name, int minChars, int maxChars) {
super(name, true, false, false, parentField.getTextSearchInfo(), Collections.emptyMap());
PrefixFieldType(TextFieldType parent, String name, int minChars, int maxChars) {
super(name, true, false, false, parent.getTextSearchInfo(), Collections.emptyMap());
this.minChars = minChars;
this.maxChars = maxChars;
this.parentField = parentField;
this.parent = parent;
setAnalyzer(parent.indexAnalyzer());
}

@Override
Expand All @@ -609,8 +616,13 @@ public ValueFetcher valueFetcher(QueryShardContext context, SearchLookup searchL
}

void setAnalyzer(NamedAnalyzer delegate) {
String analyzerName = delegate.name();
setIndexAnalyzer(
new NamedAnalyzer(delegate.name(), AnalyzerScope.INDEX, new PrefixWrappedAnalyzer(delegate.analyzer(), minChars, maxChars))
new NamedAnalyzer(
analyzerName,
AnalyzerScope.INDEX,
new PrefixWrappedAnalyzer(delegate.analyzer(), minChars, maxChars, delegate.getPositionIncrementGap(analyzerName))
)
);
}

Expand Down Expand Up @@ -639,7 +651,7 @@ public Query prefixQuery(String value, MultiTermQuery.RewriteMethod method, bool
Automaton automaton = Operations.concatenate(automata);
AutomatonQuery query = AutomatonQueries.createAutomatonQuery(new Term(name(), value + "*"), automaton, method);
return new BooleanQuery.Builder().add(query, BooleanClause.Occur.SHOULD)
.add(new TermQuery(new Term(parentField.name(), value)), BooleanClause.Occur.SHOULD)
.add(new TermQuery(new Term(parent.name(), value)), BooleanClause.Occur.SHOULD)
.build();
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -380,6 +380,57 @@ public void testIndexOptions() throws IOException {
}
}

public void testPositionIncrementGapOnIndexPrefixField() throws IOException {
// test default position_increment_gap
MapperService mapperService = createMapperService(
fieldMapping(b -> b.field("type", "text").field("analyzer", "default").startObject("index_prefixes").endObject())
);
ParsedDocument doc = mapperService.documentMapper().parse(source(b -> b.array("field", new String[] { "a", "b 12" })));

withLuceneIndex(mapperService, iw -> iw.addDocument(doc.rootDoc()), reader -> {
TermsEnum terms = getOnlyLeafReader(reader).terms("field").iterator();
assertTrue(terms.seekExact(new BytesRef("12")));
PostingsEnum postings = terms.postings(null, PostingsEnum.POSITIONS);
assertEquals(0, postings.nextDoc());
assertEquals(TextFieldMapper.Defaults.POSITION_INCREMENT_GAP + 2, postings.nextPosition());
});

withLuceneIndex(mapperService, iw -> iw.addDocument(doc.rootDoc()), reader -> {
TermsEnum terms = getOnlyLeafReader(reader).terms("field._index_prefix").iterator();
assertTrue(terms.seekExact(new BytesRef("12")));
PostingsEnum postings = terms.postings(null, PostingsEnum.POSITIONS);
assertEquals(0, postings.nextDoc());
assertEquals(TextFieldMapper.Defaults.POSITION_INCREMENT_GAP + 2, postings.nextPosition());
});

// test custom position_increment_gap
final int positionIncrementGap = randomIntBetween(1, 1000);
MapperService mapperService2 = createMapperService(
fieldMapping(
b -> b.field("type", "text")
.field("position_increment_gap", positionIncrementGap)
.field("analyzer", "default")
.startObject("index_prefixes")
.endObject()
)
);
ParsedDocument doc2 = mapperService2.documentMapper().parse(source(b -> b.array("field", new String[] { "a", "b 12" })));
withLuceneIndex(mapperService2, iw -> iw.addDocument(doc2.rootDoc()), reader -> {
TermsEnum terms = getOnlyLeafReader(reader).terms("field").iterator();
assertTrue(terms.seekExact(new BytesRef("12")));
PostingsEnum postings = terms.postings(null, PostingsEnum.POSITIONS);
assertEquals(0, postings.nextDoc());
assertEquals(positionIncrementGap + 2, postings.nextPosition());
});
withLuceneIndex(mapperService2, iw -> iw.addDocument(doc2.rootDoc()), reader -> {
TermsEnum terms = getOnlyLeafReader(reader).terms("field._index_prefix").iterator();
assertTrue(terms.seekExact(new BytesRef("12")));
PostingsEnum postings = terms.postings(null, PostingsEnum.POSITIONS);
assertEquals(0, postings.nextDoc());
assertEquals(positionIncrementGap + 2, postings.nextPosition());
});
}

public void testDefaultPositionIncrementGap() throws IOException {
MapperService mapperService = createMapperService(fieldMapping(this::minimalMapping));
ParsedDocument doc = mapperService.documentMapper().parse(source(b -> b.array("field", new String[] { "a", "b" })));
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -167,6 +167,7 @@ public void testFuzzyQuery() {

public void testIndexPrefixes() {
TextFieldType ft = createFieldType(true);
ft.setIndexAnalyzer(Lucene.STANDARD_ANALYZER);
reta marked this conversation as resolved.
Show resolved Hide resolved
ft.setPrefixFieldType(new TextFieldMapper.PrefixFieldType(ft, "field._index_prefix", 2, 10));

Query q = ft.prefixQuery("goin", CONSTANT_SCORE_REWRITE, false, randomMockShardContext());
Expand Down
Loading