-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Support querying multiple indices with the simplified RRF retriever #134822
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support querying multiple indices with the simplified RRF retriever #134822
Conversation
Hi @ioanatia, I've created a changelog YAML for you. |
Pinging @elastic/search-relevance (Team:Search - Relevance) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, nice work! 👏
One question on tests.
|
||
- match: { hits.total.value: 6 } | ||
- length: { hits.hits: 6 } | ||
- match: { hits.hits.0._id: "4" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we tested this with different shard counts, to make sure it won't be flakey in serverless where the default shard count is different or with different randomized shard configurations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not - but this is just following the same pattern for linear, for which I have not seen any failures.
we are also not matching on the score values, which always has the potential for flakiness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I left one non-blocking comment about a small test change.
// Non-default rank window size | ||
retriever = new RRFRetrieverBuilder( | ||
null, | ||
List.of("field_1", "field_2", "semantic_field_1", "semantic_field_2"), | ||
"foo2", | ||
DEFAULT_RANK_WINDOW_SIZE * 2, | ||
RRFRetrieverBuilder.DEFAULT_RANK_CONSTANT, | ||
new float[0] | ||
); | ||
assertMultiIndexMultiFieldsParamsRewrite( | ||
retriever, | ||
queryRewriteContext, | ||
Map.of( | ||
Map.of("field_1", 1.0f, "field_2", 1.0f), | ||
List.of(indexName), | ||
Map.of("field_1", 1.0f, "field_2", 1.0f, "semantic_field_1", 1.0f), | ||
List.of(anotherIndexName) | ||
), | ||
Map.of( | ||
new Tuple<>("semantic_field_1", List.of(indexName)), | ||
1.0f, | ||
new Tuple<>("semantic_field_2", List.of(indexName)), | ||
1.0f, | ||
new Tuple<>("semantic_field_2", List.of(anotherIndexName)), | ||
1.0f | ||
), | ||
"foo2", | ||
null | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use this to test non-default rank constant as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
promise to do it in a follow up - I don't want to keep @mridula-s109 waiting too much on this PR to continue the work on #132680 - this is the second PR that will introduce conflicts for her PR.
* upstream/main: (43 commits) Unmute testAckedIndexing to see if it still fails on main (elastic#134682) Silence time zone ID deprecation warning for JDK 25 due to log4j2 bug. (elastic#134719) Adding a getUnmodifiableSourceAndMetadata() method to IngestDocument (elastic#134816) Mark the create-index-from-source action as publicly available on Serverless (elastic#134953) ESQL: Rename command from INLINESTATS to INLINE STATS (elastic#134827) Document multi index query support for simplified retrievers (elastic#134980) [ML] Fix YAMl test to use correct query parameter type (elastic#134999) [Transform] Wait for PIT to close (elastic#134955) Add XPath to XmlUtils (elastic#134923) Fixing conditional processor mutability bugs (elastic#134936) [Transform] Lower loglevel of 3 transform-related error messages from ERROR to WARN (elastic#134985) Unmute pattern text tests. (elastic#134981) Integrate weights into simplified RRF retriever syntax (elastic#132680) Mute org.elasticsearch.xpack.esql.qa.mixed.MixedClusterEsqlSpecIT test {csv-spec:stats.CountDistinctWithConditions} elastic#134993 Update periodic java-ea build to test java 26 pre-release (elastic#134983) Mute org.elasticsearch.xpack.esql.ccq.MultiClusterSpecIT test {csv-spec:stats.CountDistinctWithConditions} elastic#134984 Fix and unmute testIndexSettingProviderPrivateSetting (elastic#134861) Add missing common cat params (elastic#134870) Support querying multiple indices with the simplified RRF retriever (elastic#134822) Allow including semantic field embeddings in _source (elastic#134717) ...
follows the same approach from #133720 where we add support for querying multiple indices for the simplified linear retriever.
#133720 already does most of it, here we just need to remove the validation check that disallowed querying multiple indices and add tests.
the tests are very similar to #133720, I just needed to adjust them slightly since specifying boosts for fields is disallowed for the simplified RRF retriever.