-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Weights in simplified RRF retriever cleanup #135040
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pinging @elastic/search-relevance (Team:Search - Relevance) |
"Basic per-field boosting using the simplified format": | ||
- requires: | ||
cluster_features: ["rrf_retriever.simplified_weighted_support"] | ||
reason: "Simplified weighted fields syntax support" | ||
|
||
- do: | ||
search: | ||
index: test-index | ||
body: | ||
retriever: | ||
rrf: | ||
fields: [ "text_1", "text_2^2" ] | ||
query: "foo" | ||
|
||
# With weighted fields, verify basic functionality | ||
- gte: { hits.total.value: 1 } | ||
- length: { hits.hits: 1 } | ||
# Verify that text_2^2 affects ranking (basic smoke test) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is redundant, as we already have a test that applies per-field boosts to lexical matches and checks for changes in result order. Additionally, the assertions performed in this test do not meaningfully check that the boost actually was applied.
"Semantic field weighting": | ||
- requires: | ||
cluster_features: ["rrf_retriever.simplified_weighted_support"] | ||
reason: "Simplified weighted fields syntax support" | ||
|
||
- do: | ||
search: | ||
index: test-index | ||
body: | ||
retriever: | ||
rrf: | ||
fields: ["dense_inference^2", "sparse_inference^1.5"] | ||
query: "elasticsearch" | ||
|
||
- match: { hits.total.value: 3 } | ||
- length: { hits.hits: 3 } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as https://github.com/elastic/elasticsearch/pull/135040/files#r2361285845, but for boosts on semantic matches.
"Zero weight handling": | ||
- requires: | ||
cluster_features: ["rrf_retriever.simplified_weighted_support"] | ||
reason: "Simplified weighted fields syntax support" | ||
|
||
- do: | ||
search: | ||
index: test-index | ||
body: | ||
retriever: | ||
rrf: | ||
fields: ["text_1^0", "text_2^1"] | ||
query: "foo" | ||
|
||
- gte: { hits.total.value: 1 } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have coverage for this in the rewrite tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RRF scores are generally not meaningful, hence why the existing tests did not check them. Moreover, the way they were being checked did not add any real value.
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test looks good to me. The one piece of feedback that I have for the unit tests, is that it would be really helpful to add comments to the tests explaining why the scores are expected. Just to make it a little readable and parseable. Maybe that's something @mridula-s109 could help with in a followup if we agree it would be useful?
Good point - adding comments to explain the expected scores would definitely make the tests more readable. I’m happy to take that up in a follow-up PR if everyone agrees. |
Do you mean the per-field boosts? We don't check document scores in the unit tests. |
Sorry, yes, e.g. quickly explain why we would expect a boost of 3.75 on a combined field |
Cleans up tests added in #132680. The rewrite tests in
LinearRetrieverBuilderTests
andRRFRetrieverBuilderTests
are now aligned, testing the same scenarios in the same order. This should make these tests easier to maintain. The YAML tests have been adjusted to remove redundant tests and score checks that did not add value.