-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] support default model id in neural_sparse query #614
[FEATURE] support default model id in neural_sparse query #614
Conversation
8257486
to
7176ebb
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #614 +/- ##
============================================
+ Coverage 82.57% 82.65% +0.07%
- Complexity 656 663 +7
============================================
Files 52 52
Lines 2055 2064 +9
Branches 328 330 +2
============================================
+ Hits 1697 1706 +9
Misses 212 212
Partials 146 146 ☔ View full report in Codecov by Sentry. |
@zhichao-aws We have fixed the build on main. Can you rebase and try running failing tests again? Thanks |
acce8d2
to
7f3d52f
Compare
rebase on #615 |
src/main/java/org/opensearch/neuralsearch/query/NeuralSparseQueryBuilder.java
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/query/ModelInferenceQueryBuilder.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/query/ModelInferenceQueryBuilder.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/query/ModelInferenceQueryBuilder.java
Outdated
Show resolved
Hide resolved
src/test/java/org/opensearch/neuralsearch/processor/NeuralQueryEnricherProcessorIT.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/query/ModelInferenceQueryBuilder.java
Show resolved
Hide resolved
7f3d52f
to
1189b4e
Compare
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
...rt-upgrade/src/test/java/org/opensearch/neuralsearch/bwc/NeuralQueryEnricherProcessorIT.java
Show resolved
Hide resolved
...rt-upgrade/src/test/java/org/opensearch/neuralsearch/bwc/NeuralQueryEnricherProcessorIT.java
Show resolved
Hide resolved
...rt-upgrade/src/test/java/org/opensearch/neuralsearch/bwc/NeuralQueryEnricherProcessorIT.java
Show resolved
Hide resolved
...rt-upgrade/src/test/java/org/opensearch/neuralsearch/bwc/NeuralQueryEnricherProcessorIT.java
Show resolved
Hide resolved
} | ||
} | ||
|
||
public void testNeuralQueryEnricherProcessor_NeuralSearch_E2EFlow() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comments which I did for the above tests applies here as well.
...ng-upgrade/src/test/java/org/opensearch/neuralsearch/bwc/NeuralQueryEnricherProcessorIT.java
Show resolved
Hide resolved
This scenerio is similar to users have 2 knn fields and needs 2 default model_id for neural query. In this case, users can set the |
/** | ||
* Set a new model id for the query builder. | ||
*/ | ||
public ModelInferenceQueryBuilder modelId(String modelId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this returned value used anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's used by the NeuralSearchQueryVisitor to check whether the model id field is null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you point the line using this return value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The line you're pointing at seems the getter method without parameter, am I missing anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I misunderstood your question. The invokation is here:
https://github.com/zhichao-aws/neural-search/blob/1ade82f5566b96fe617ab1f6c5933ebad3258c6d/src/main/java/org/opensearch/neuralsearch/query/visitor/NeuralSearchQueryVisitor.java#L38
https://github.com/zhichao-aws/neural-search/blob/1ade82f5566b96fe617ab1f6c5933ebad3258c6d/src/main/java/org/opensearch/neuralsearch/query/visitor/NeuralSearchQueryVisitor.java#L40
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We just set the modelId, and don't use the return value. But this setter method is generated automatically by lombok Setter annotation. And the return value for setter is the class itself by default.
...rt-upgrade/src/test/java/org/opensearch/neuralsearch/bwc/NeuralQueryEnricherProcessorIT.java
Show resolved
Hide resolved
src/testFixtures/java/org/opensearch/neuralsearch/BaseNeuralSearchIT.java
Show resolved
Hide resolved
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Hey @zhichao-aws Consider you created a processor
By the code definition the first priority is given to neural_field_default_id to set the model Id in the querybuilder. Then if the neural_field_default_id is not present then it will cater default_model_id Q ) Consider an example where cx want to do a neural search with default_model_id which applies to all field regardless of any field specific model id. A) Cx can set a default_model_id which applies it to all fields.
Q) The work which you did Now if Cx want a specific model Id for neural sparse search which is field related (lets say field name is passage_text) and for neural search cx want a default_model_id which applies to all fields.The work which you did will work like A) Cx created a processor which has default_model_id for neural search and field specific model Id for neural sparse search
Now as per your code it will always select field specific model id because field map will never be empty and the code you did in neuralsearchqueryvisitor will always apply same logic for neural search and neural sparse search. Moreover, the (reply)[https://github.com//pull/614#issuecomment-1986586819] you gave here says here to create a different model Id in for each field in the neural_field_default_id map.
Also there might most of the scenarios which might be missing. I have already discussed with @navneet1v and would advise to create a new visitor. Also for more readability add one more map in the neural_query_enricher processor i.e. neural_sparse_field_default_id and use it in new visitor. Also consider 1 more scenario
How can someone give a different model id for a specific field in both the clauses above when having same field name by using your code implementation in the PR? |
This is an impossible scenerio. Because
Now the code implementation is as follows:
Consider we have a default model id for |
LGTM, approving. |
Looks much better now. LGTM , but before I approve this merge request can you add a documentation-website issue to add the documentation for cx to understand the feature. Sample issue: opensearch-project/documentation-website#5060 |
Good point. The document issue link: opensearch-project/documentation-website#6652 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
...ng-upgrade/src/test/java/org/opensearch/neuralsearch/bwc/NeuralQueryEnricherProcessorIT.java
Outdated
Show resolved
Hide resolved
...ng-upgrade/src/test/java/org/opensearch/neuralsearch/bwc/NeuralQueryEnricherProcessorIT.java
Outdated
Show resolved
Hide resolved
@@ -108,7 +119,9 @@ protected void doXContent(XContentBuilder xContentBuilder, Params params) throws | |||
xContentBuilder.startObject(NAME); | |||
xContentBuilder.startObject(fieldName); | |||
xContentBuilder.field(QUERY_TEXT_FIELD.getPreferredName(), queryText); | |||
xContentBuilder.field(MODEL_ID_FIELD.getPreferredName(), modelId); | |||
if (modelId != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can use Objects.nonNull here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
/** | ||
* Set a new model id for the query builder. | ||
*/ | ||
public ModelInferenceQueryBuilder modelId(String modelId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure I got the design idea for this method. It looks like a setter, but it retuning a value. And in the example that you've provided that value is not even used.
If my understanding is correct I suggest you make method's return type as void
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In NeuralSparseQueryBuilder and NeuralQueryBuilder we have a modelId
setter method generated automatically by lombok @Setter
annotation. The auto generated modelId
will take the class itself as the return type.
Do you prefer we add a setModelId
method with void return value in NeuralSparseQueryBuilder and NeuralQueryBuilder?
/** | ||
* Get the model id used by ml-commons model inference. Return null if the model id is absent. | ||
*/ | ||
public String modelId(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we expect model id to be null, why not to use Optional as return type. That is giving clear indication to a caller that null object is a designed normal case rather than exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This getter method is also generated automatically by lombok.
Do you prefer we define a getModelId
with Optional return type in NeuralSparseQueryBuilder and NeuralQueryBuilder?
if (neuralQueryBuilder.modelId() == null) { | ||
if (queryBuilder instanceof ModelInferenceQueryBuilder) { | ||
ModelInferenceQueryBuilder modelInferenceQueryBuilder = (ModelInferenceQueryBuilder) queryBuilder; | ||
if (modelInferenceQueryBuilder.modelId() == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Objects.nonNull, or Optional.isPresent if you'd like to take Optional
route
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
String fieldDefaultModelId = (String) neuralFieldMap.get(neuralQueryBuilder.fieldName()); | ||
neuralQueryBuilder.modelId(fieldDefaultModelId); | ||
&& modelInferenceQueryBuilder.fieldName() != null | ||
&& neuralFieldMap.get(modelInferenceQueryBuilder.fieldName()) != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these conditions are hard to read, can you incapsulate them into a method and give it a meaningful name, like for instance isDefaultModelIdDefined
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Please check the codecoverage and fix the GH action |
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-614-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 e41fba7aa82401bd9dce5dec4e08ea7191512fa9
# Push it to GitHub
git push --set-upstream origin backport/backport-614-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x Then, create a pull request where the |
…-project#614) * feature: implement default model id for neural sparse Signed-off-by: zhichao-aws <zhichaog@amazon.com> * feature: implement default model id for neural sparse Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add ut Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add ut it Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add changelog Signed-off-by: zhichao-aws <zhichaog@amazon.com> * nit Signed-off-by: zhichao-aws <zhichaog@amazon.com> * fix ingest pipeline in it Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add it for bwc restart-upgrade Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add it for bwc restart-upgrade Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add it for bwc restart-upgrade Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add it for bwc restart-upgrade Signed-off-by: zhichao-aws <zhichaog@amazon.com> * fix undeploy with retry Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add it for bwc restart-upgrade Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add it for bwc restart-upgrade Signed-off-by: zhichao-aws <zhichaog@amazon.com> * optimize it code structure Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add it for bwc rolling-upgrade Signed-off-by: zhichao-aws <zhichaog@amazon.com> * tidy Signed-off-by: zhichao-aws <zhichaog@amazon.com> * update index mapping in it Signed-off-by: zhichao-aws <zhichaog@amazon.com> * nit Signed-off-by: zhichao-aws <zhichaog@amazon.com> * move version check to build script Signed-off-by: zhichao-aws <zhichaog@amazon.com> * resolve modelId Signed-off-by: zhichao-aws <zhichaog@amazon.com> * nit Signed-off-by: zhichao-aws <zhichaog@amazon.com> * update init model id Signed-off-by: zhichao-aws <zhichaog@amazon.com> * modify versions check logic in bwc test Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add comments Signed-off-by: zhichao-aws <zhichaog@amazon.com> * nit Signed-off-by: zhichao-aws <zhichaog@amazon.com> * updates for comments Signed-off-by: zhichao-aws <zhichaog@amazon.com> --------- Signed-off-by: zhichao-aws <zhichaog@amazon.com> (cherry picked from commit e41fba7)
* [FEATURE] support default model id in neural_sparse query (#614) * feature: implement default model id for neural sparse Signed-off-by: zhichao-aws <zhichaog@amazon.com> * feature: implement default model id for neural sparse Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add ut Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add ut it Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add changelog Signed-off-by: zhichao-aws <zhichaog@amazon.com> * nit Signed-off-by: zhichao-aws <zhichaog@amazon.com> * fix ingest pipeline in it Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add it for bwc restart-upgrade Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add it for bwc restart-upgrade Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add it for bwc restart-upgrade Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add it for bwc restart-upgrade Signed-off-by: zhichao-aws <zhichaog@amazon.com> * fix undeploy with retry Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add it for bwc restart-upgrade Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add it for bwc restart-upgrade Signed-off-by: zhichao-aws <zhichaog@amazon.com> * optimize it code structure Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add it for bwc rolling-upgrade Signed-off-by: zhichao-aws <zhichaog@amazon.com> * tidy Signed-off-by: zhichao-aws <zhichaog@amazon.com> * update index mapping in it Signed-off-by: zhichao-aws <zhichaog@amazon.com> * nit Signed-off-by: zhichao-aws <zhichaog@amazon.com> * move version check to build script Signed-off-by: zhichao-aws <zhichaog@amazon.com> * resolve modelId Signed-off-by: zhichao-aws <zhichaog@amazon.com> * nit Signed-off-by: zhichao-aws <zhichaog@amazon.com> * update init model id Signed-off-by: zhichao-aws <zhichaog@amazon.com> * modify versions check logic in bwc test Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add comments Signed-off-by: zhichao-aws <zhichaog@amazon.com> * nit Signed-off-by: zhichao-aws <zhichaog@amazon.com> * updates for comments Signed-off-by: zhichao-aws <zhichaog@amazon.com> --------- Signed-off-by: zhichao-aws <zhichaog@amazon.com> (cherry picked from commit e41fba7) * resolve conflicts Signed-off-by: zhichao-aws <zhichaog@amazon.com> * spotless Apply Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add dependency Signed-off-by: zhichao-aws <zhichaog@amazon.com> * update build.gradle Signed-off-by: zhichao-aws <zhichaog@amazon.com> --------- Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Description
see #610
In this PR we support default model_id for neural_sparse query. Existing code use check the class of queryBuilder in the visitor(ref), use modelId method to set the default model id for all NeuralQueryBuilder. This PR create a new interface called ModelInferenceQueryBuilder, it has the common method for setting default model id. In this PR, we let the NeuralQueryBuilder and NeuralSparseQueryBuilder implements the ModelInferenceQueryBuilder, and in the visitor we'll check and set default model id for ModelInferenceQueryBuilder. The unit test, integration test and bwc test can cover the changes in this PR, and make sure it doesn't have impact on existing functionality.
Issues Resolved
#610
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.