-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
semantic_text: Add exists query #110027
semantic_text: Add exists query #110027
Conversation
...nference/src/main/java/org/elasticsearch/xpack/inference/mapper/SemanticTextFieldMapper.java
Outdated
Show resolved
Hide resolved
@@ -178,36 +181,10 @@ public void testDynamicUpdate() throws IOException { | |||
final String fieldName = "semantic"; | |||
final String inferenceId = "test_service"; | |||
|
|||
MapperService mapperService = createMapperService(mapping(b -> {})); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I extracted this as a method to enable updating the mapping to check the actual exists queries created
assertThat(toParentBlockJoinQuery.getChildQuery(), instanceOf(FieldExistsQuery.class)); | ||
} | ||
|
||
private MapperService mapperServiceForFieldWithModelSettings( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... and this is the extracted method
Pinging @elastic/es-search (Team:Search) |
...nference/src/main/java/org/elasticsearch/xpack/inference/mapper/SemanticTextFieldMapper.java
Outdated
Show resolved
Hide resolved
...nce/src/test/java/org/elasticsearch/xpack/inference/mapper/SemanticTextFieldMapperTests.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left two suggestions.
...nference/src/main/java/org/elasticsearch/xpack/inference/mapper/SemanticTextFieldMapper.java
Outdated
Show resolved
Hide resolved
...nference/src/main/java/org/elasticsearch/xpack/inference/mapper/SemanticTextFieldMapper.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
||
/** | ||
* Returns the primitive Lucene query for a nested query given the primitive query to wrap | ||
* @param innerQuery query to wraqp in a nested query |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* @param innerQuery query to wraqp in a nested query | |
* @param innerQuery query to wrap in a nested query |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
corrqected 😝
server/src/main/java/org/elasticsearch/index/query/NestedQueryBuilder.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure the doToQuery
and toQuery
logic for the rewrite is 100% correct.
@@ -260,6 +260,34 @@ protected int doHashCode() { | |||
|
|||
@Override | |||
protected Query doToQuery(SearchExecutionContext context) throws IOException { | |||
Query innerQuery; | |||
NestedObjectMapper mapper = context.nestedLookup().getNestedMappers().get(path); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all checks that were previously done in toQuery
before this nested lookup need to occur.
Including
NestedObjectMapper mapper = context.nestedLookup().getNestedMappers().get(path);
if (mapper == null) {
if (ignoreUnmapped) {
return new MatchNoDocsQuery();
} else {
throw new IllegalStateException("[" + NAME + "] failed to find nested object under path [" + path + "]");
}
}
And
if (context.allowExpensiveQueries() == false) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally - I moved the meat out of the sandwich. I have corrected it now.
...nference/src/main/java/org/elasticsearch/xpack/inference/mapper/SemanticTextFieldMapper.java
Show resolved
Hide resolved
* @return the primitive Lucene query | ||
*/ | ||
public static <E extends Exception> Query toQuery( | ||
CheckedFunction<SearchExecutionContext, Query, E> queryProvider, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great CheckedFunction
usage!
Mapper mapper = mapperService.mappingLookup().getMapper(fieldName); | ||
assertNotNull(mapper); | ||
Query existsQuery = ((SemanticTextFieldMapper) mapper).fieldType().existsQuery(createSearchExecutionContext(mapperService)); | ||
assertThat(existsQuery, instanceOf(ESToParentBlockJoinQuery.class)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add back the inner query check after this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that's too much coupling and details - it's not about checking the bool query with two clauses that are resulting from this, but the actual need to do so given we have YAML coverage. We're functionally covered, and we probably shouldn't care about the actual details of the query rewriting.
Happy to add the checks if you think it's absolutely needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that hard-coding the expected query class creates too much coupling. However, as written testExistsQuerySparseVector
& testExistsQueryDenseVector
are not meaningfully different. In particular, the lack of assertions on the child query means that we're not testing that the child query is reflective of the backing embedding field type (sparse_vector
or dense_vector
).
I thought about how to resolve this without creating unmaintainable couplings and came up with this approach:
MapperBuilderContext mapperBuilderContext = MapperBuilderContext.root(false, false);
SearchExecutionContext searchExecutionContext = createSearchExecutionContext(mapperService);
SparseVectorFieldMapper sparseVectorFieldMapper = new SparseVectorFieldMapper.Builder(SemanticTextField.getEmbeddingsFieldName(fieldName)).build(mapperBuilderContext);
...
assertThat(toParentBlockJoinQuery.getChildQuery(), equalTo(sparseVectorFieldMapper.fieldType().existsQuery(searchExecutionContext)));
And something similar for testExistsQueryDenseVector
, except creating a DenseVectorFieldMapper
. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that, I think it's a good middle ground!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mapper mapper = mapperService.mappingLookup().getMapper(fieldName); | ||
assertNotNull(mapper); | ||
Query existsQuery = ((SemanticTextFieldMapper) mapper).fieldType().existsQuery(createSearchExecutionContext(mapperService)); | ||
assertThat(existsQuery, instanceOf(ESToParentBlockJoinQuery.class)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add back the inner query check after this?
…-query # Conflicts: # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/mapper/SemanticTextFieldMapperTests.java
Query expectedQuery = NestedQueryBuilder.toQuery( | ||
context -> sparseVectorFieldMapper.fieldType().existsQuery(context), | ||
SemanticTextField.getChunksFieldName(fieldName), | ||
toParentBlockJoinQuery.getScoreMode(), | ||
false, | ||
searchExecutionContext | ||
); | ||
assertThat(toParentBlockJoinQuery, equalTo(expectedQuery)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean. WDYT about something like this?
Query expectedQuery = NestedQueryBuilder.toQuery( | |
context -> sparseVectorFieldMapper.fieldType().existsQuery(context), | |
SemanticTextField.getChunksFieldName(fieldName), | |
toParentBlockJoinQuery.getScoreMode(), | |
false, | |
searchExecutionContext | |
); | |
assertThat(toParentBlockJoinQuery, equalTo(expectedQuery)); | |
assertThat(toParentBlockJoinQuery.getChildQuery(), instanceOf(BooleanQuery.class)); | |
BooleanQuery booleanQuery = (BooleanQuery) toParentBlockJoinQuery.getChildQuery(); | |
assertTrue( | |
booleanQuery.clauses() | |
.stream() | |
.anyMatch(b -> b.getQuery().equals(sparseVectorFieldMapper.fieldType().existsQuery(searchExecutionContext))) | |
); |
We still need to know that we're creating a boolean query under the hood (which isn't entirely bad IMO), but we're duplicating less.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still find this very leaky. Given that we're already testing the query with the YAML tests, I'd opt for the simplest solution here, and just asserting the top ESToParentBlockJoinQuery
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. Thanks for iterating on this with me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was a good idea Mike, thanks for sharing!
Closes elastic/obs-ai-assistant-team#162 Closes #192757 This replaces the ML inference pipeline with `semantic_text` and adds a migration task that runs automatically when Kibana starts. Blocked by: - elastic/elasticsearch#110027 - elastic/elasticsearch#110033 - elastic/ml-team#1298
…ic#186499) Closes elastic/obs-ai-assistant-team#162 Closes elastic#192757 This replaces the ML inference pipeline with `semantic_text` and adds a migration task that runs automatically when Kibana starts. Blocked by: - elastic/elasticsearch#110027 - elastic/elasticsearch#110033 - elastic/ml-team#1298 (cherry picked from commit 671ff30) # Conflicts: # x-pack/plugins/translations/translations/zh-CN.json
…ic#186499) Closes elastic/obs-ai-assistant-team#162 Closes elastic#192757 This replaces the ML inference pipeline with `semantic_text` and adds a migration task that runs automatically when Kibana starts. Blocked by: - elastic/elasticsearch#110027 - elastic/elasticsearch#110033 - elastic/ml-team#1298
…ic#186499) Closes elastic/obs-ai-assistant-team#162 Closes elastic#192757 This replaces the ML inference pipeline with `semantic_text` and adds a migration task that runs automatically when Kibana starts. Blocked by: - elastic/elasticsearch#110027 - elastic/elasticsearch#110033 - elastic/ml-team#1298
Does what it says on the tin