Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

semantic_text: Add exists query #110027

Merged

Conversation

carlosdelest
Copy link
Member

Does what it says on the tin

@@ -178,36 +181,10 @@ public void testDynamicUpdate() throws IOException {
final String fieldName = "semantic";
final String inferenceId = "test_service";

MapperService mapperService = createMapperService(mapping(b -> {}));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I extracted this as a method to enable updating the mapping to check the actual exists queries created

assertThat(toParentBlockJoinQuery.getChildQuery(), instanceOf(FieldExistsQuery.class));
}

private MapperService mapperServiceForFieldWithModelSettings(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and this is the extracted method

@carlosdelest carlosdelest added >non-issue :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Meta label for search team labels Jun 21, 2024
@carlosdelest carlosdelest marked this pull request as ready for review June 21, 2024 11:42
@carlosdelest carlosdelest requested review from Mikep86 and jimczi June 21, 2024 11:42
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left two suggestions.

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


/**
* Returns the primitive Lucene query for a nested query given the primitive query to wrap
* @param innerQuery query to wraqp in a nested query
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @param innerQuery query to wraqp in a nested query
* @param innerQuery query to wrap in a nested query

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corrqected 😝

Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure the doToQuery and toQuery logic for the rewrite is 100% correct.

@@ -260,6 +260,34 @@ protected int doHashCode() {

@Override
protected Query doToQuery(SearchExecutionContext context) throws IOException {
Query innerQuery;
NestedObjectMapper mapper = context.nestedLookup().getNestedMappers().get(path);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all checks that were previously done in toQuery before this nested lookup need to occur.

Including

 NestedObjectMapper mapper = context.nestedLookup().getNestedMappers().get(path);
        if (mapper == null) {
            if (ignoreUnmapped) {
                return new MatchNoDocsQuery();
            } else {
                throw new IllegalStateException("[" + NAME + "] failed to find nested object under path [" + path + "]");
            }
        }

And

if (context.allowExpensiveQueries() == false) {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally - I moved the meat out of the sandwich. I have corrected it now.

* @return the primitive Lucene query
*/
public static <E extends Exception> Query toQuery(
CheckedFunction<SearchExecutionContext, Query, E> queryProvider,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great CheckedFunction usage!

Mapper mapper = mapperService.mappingLookup().getMapper(fieldName);
assertNotNull(mapper);
Query existsQuery = ((SemanticTextFieldMapper) mapper).fieldType().existsQuery(createSearchExecutionContext(mapperService));
assertThat(existsQuery, instanceOf(ESToParentBlockJoinQuery.class));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add back the inner query check after this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that's too much coupling and details - it's not about checking the bool query with two clauses that are resulting from this, but the actual need to do so given we have YAML coverage. We're functionally covered, and we probably shouldn't care about the actual details of the query rewriting.

Happy to add the checks if you think it's absolutely needed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that hard-coding the expected query class creates too much coupling. However, as written testExistsQuerySparseVector & testExistsQueryDenseVector are not meaningfully different. In particular, the lack of assertions on the child query means that we're not testing that the child query is reflective of the backing embedding field type (sparse_vector or dense_vector).

I thought about how to resolve this without creating unmaintainable couplings and came up with this approach:

MapperBuilderContext mapperBuilderContext = MapperBuilderContext.root(false, false);
SearchExecutionContext searchExecutionContext = createSearchExecutionContext(mapperService);
SparseVectorFieldMapper sparseVectorFieldMapper = new SparseVectorFieldMapper.Builder(SemanticTextField.getEmbeddingsFieldName(fieldName)).build(mapperBuilderContext);

...

assertThat(toParentBlockJoinQuery.getChildQuery(), equalTo(sparseVectorFieldMapper.fieldType().existsQuery(searchExecutionContext)));

And something similar for testExistsQueryDenseVector, except creating a DenseVectorFieldMapper. WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that, I think it's a good middle ground!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Mikep86 I've included additional checks in cabf436.

I'm not convinced on this - we're essentially re-doing what the implementation does internally. But I couldn't find another way of including the nested path in the final bool query otherwise.

Mapper mapper = mapperService.mappingLookup().getMapper(fieldName);
assertNotNull(mapper);
Query existsQuery = ((SemanticTextFieldMapper) mapper).fieldType().existsQuery(createSearchExecutionContext(mapperService));
assertThat(existsQuery, instanceOf(ESToParentBlockJoinQuery.class));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add back the inner query check after this?

…-query

# Conflicts:
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/mapper/SemanticTextFieldMapperTests.java
Comment on lines 563 to 570
Query expectedQuery = NestedQueryBuilder.toQuery(
context -> sparseVectorFieldMapper.fieldType().existsQuery(context),
SemanticTextField.getChunksFieldName(fieldName),
toParentBlockJoinQuery.getScoreMode(),
false,
searchExecutionContext
);
assertThat(toParentBlockJoinQuery, equalTo(expectedQuery));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean. WDYT about something like this?

Suggested change
Query expectedQuery = NestedQueryBuilder.toQuery(
context -> sparseVectorFieldMapper.fieldType().existsQuery(context),
SemanticTextField.getChunksFieldName(fieldName),
toParentBlockJoinQuery.getScoreMode(),
false,
searchExecutionContext
);
assertThat(toParentBlockJoinQuery, equalTo(expectedQuery));
assertThat(toParentBlockJoinQuery.getChildQuery(), instanceOf(BooleanQuery.class));
BooleanQuery booleanQuery = (BooleanQuery) toParentBlockJoinQuery.getChildQuery();
assertTrue(
booleanQuery.clauses()
.stream()
.anyMatch(b -> b.getQuery().equals(sparseVectorFieldMapper.fieldType().existsQuery(searchExecutionContext)))
);

We still need to know that we're creating a boolean query under the hood (which isn't entirely bad IMO), but we're duplicating less.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still find this very leaky. Given that we're already testing the query with the YAML tests, I'd opt for the simplest solution here, and just asserting the top ESToParentBlockJoinQuery.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. Thanks for iterating on this with me!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a good idea Mike, thanks for sharing!

@carlosdelest carlosdelest merged commit 258a8b5 into elastic:main Jul 1, 2024
15 checks passed
sorenlouv added a commit to elastic/kibana that referenced this pull request Nov 14, 2024
Closes elastic/obs-ai-assistant-team#162 
Closes #192757

This replaces the ML inference pipeline with `semantic_text` and adds a
migration task that runs automatically when Kibana starts.

Blocked by:
 - elastic/elasticsearch#110027
 - elastic/elasticsearch#110033
 - elastic/ml-team#1298
sorenlouv added a commit to sorenlouv/kibana that referenced this pull request Nov 14, 2024
…ic#186499)

Closes elastic/obs-ai-assistant-team#162
Closes elastic#192757

This replaces the ML inference pipeline with `semantic_text` and adds a
migration task that runs automatically when Kibana starts.

Blocked by:
 - elastic/elasticsearch#110027
 - elastic/elasticsearch#110033
 - elastic/ml-team#1298

(cherry picked from commit 671ff30)

# Conflicts:
#	x-pack/plugins/translations/translations/zh-CN.json
CAWilson94 pushed a commit to CAWilson94/kibana that referenced this pull request Nov 18, 2024
…ic#186499)

Closes elastic/obs-ai-assistant-team#162 
Closes elastic#192757

This replaces the ML inference pipeline with `semantic_text` and adds a
migration task that runs automatically when Kibana starts.

Blocked by:
 - elastic/elasticsearch#110027
 - elastic/elasticsearch#110033
 - elastic/ml-team#1298
CAWilson94 pushed a commit to CAWilson94/kibana that referenced this pull request Nov 18, 2024
…ic#186499)

Closes elastic/obs-ai-assistant-team#162 
Closes elastic#192757

This replaces the ML inference pipeline with `semantic_text` and adds a
migration task that runs automatically when Kibana starts.

Blocked by:
 - elastic/elasticsearch#110027
 - elastic/elasticsearch#110033
 - elastic/ml-team#1298
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>non-issue :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Meta label for search team v8.15.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants