-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add parsing logic for neural query #15
Conversation
public NeuralQueryBuilder fieldName(String fieldName) { | ||
this.fieldName = fieldName; | ||
return this; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we see if we can use @builder of lombok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this, but I didnt see how it would work, given we are extending a builder class. Instead, I followed how AbstractQueryBuilder handled it for the boost field: https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/index/query/AbstractQueryBuilder.java#L172
static final ParseField QUERY_TEXT_FIELD = new ParseField("query_text"); | ||
|
||
static final ParseField MODEL_ID_FIELD = new ParseField("model_id"); | ||
|
||
static final ParseField K_FIELD = new ParseField("k"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why these are not private? and is package private?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be we can use @VisibleForTesting
for clarification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it was for testing purposes. I will add annotation.
public NeuralQueryBuilder fieldName(String fieldName) { | ||
this.fieldName = fieldName; | ||
return this; | ||
} | ||
|
||
/** | ||
* Set the queryText that will be translated into the dense query vector used for k-NN search. | ||
* | ||
* @param queryText Text of a query that should be translated to a dense vector | ||
* @return this | ||
*/ | ||
public NeuralQueryBuilder queryText(String queryText) { | ||
this.queryText = queryText; | ||
return this; | ||
} | ||
|
||
/** | ||
* Set the modelId that should produce the dense query vector | ||
* | ||
* @param modelId ID of model to produce query vector | ||
* @return this | ||
*/ | ||
public NeuralQueryBuilder modelId(String modelId) { | ||
this.modelId = modelId; | ||
return this; | ||
} | ||
|
||
/** | ||
* Set the number of neighbors that should be retrieved during k-NN search | ||
* | ||
* @param k number of neighbors to be retrieved in k-NN query | ||
* @return this | ||
*/ | ||
public NeuralQueryBuilder k(int k) { | ||
this.k = k; | ||
return this; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can use @Setter for these. any reason for not using setters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I wasn't sure how to use lombok for this, but it looks like it is possible with:
@Setter
@Accessors(chain=true,fluent=true)
@Override | ||
protected int doHashCode() { | ||
return new HashCodeBuilder().append(fieldName).append(queryText).append(modelId).append(k).toHashCode(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason for not using @equals and hascode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reason for this is that we are not implementing "hashCode" and "equals" method, but instead "doHashCode" and "doEquals".
private static final float BOOST = 1.8f; | ||
private static final String QUERY_NAME = "queryName"; | ||
|
||
public void testFromXContent_valid_withDefaults() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see if we want to use @SneakyThrows
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will update
assertEquals(K, neuralQueryBuilder.getK()); | ||
} | ||
|
||
public void testFromXContent_valid_withOptionals() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see if we want to rename the test function to something like testFromXConent_whenXXXX_thenYYYYY for all the functions. This can help in just getting to know what is being tested with what just be reading the name of the fuction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will update
out.writeString(this.fieldName); | ||
out.writeString(this.queryText); | ||
out.writeString(this.modelId); | ||
out.writeInt(this.k); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see if want to use writeVInt() instead of Int. VInt uses less bytes than int.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will update
int k1 = 1; | ||
int k2 = 2; | ||
|
||
NeuralQueryBuilder neuralQueryBuilder1 = new NeuralQueryBuilder().fieldName(fieldName1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than using 1,2,3,4,5 can we rename them in fashion with which just by variable name we can tell what is missing/what default values we have used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will update
static final ParseField QUERY_TEXT_FIELD = new ParseField("query_text"); | ||
|
||
static final ParseField MODEL_ID_FIELD = new ParseField("model_id"); | ||
|
||
static final ParseField K_FIELD = new ParseField("k"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be we can use @VisibleForTesting
for clarification.
requireValue(neuralQueryBuilder.getQueryText(), "Query text must be provided for neural query"); | ||
requireValue(neuralQueryBuilder.getFieldName(), "Field name must be provided for neural query"); | ||
requireValue(neuralQueryBuilder.getModelId(), "Model ID must be provided for neural query"); | ||
requireValue(neuralQueryBuilder.getK(), "K must be provided for neural query"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't k
have a default value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about this. Yes, I think we can make it 10 by default.
} | ||
|
||
@Override | ||
protected Query doToQuery(QueryShardContext queryShardContext) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the purpose of this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"toQuery" will build a Query object using the queryShardContext and the queryBuilder. "doToQuery" is used internally to produce the query in AbstractQueryBuilder.
Its null right now because we will be implementing in future PR. Ill add a comment.
From this query example, user could run neural search on multiple |
No, user would be able to run one neural search on one field per query. |
If we can only support one filed per neural search, how about change to this format to avoid confusion
User may think this could be a correct neural query
|
@ylwu-amzn Yes, I see what you mean. The main reason that I opted to make the field name the key was to follow OpenSearch convention. For instance, for the text matching query types, the field is the key. Similarly, this is how it is done for k-NN. |
Adds new query builder to plugin: NeuralQueryBuilder. Adds logic to parse the user provided parameters for the NeuralQueryBuilder as well as a few other smaller methods. Query text to dense vector logic will be added in a future PR Adds unit tests. Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: John Mazanec <jmazane@amazon.com>
1721893
to
74fb5a1
Compare
Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: John Mazanec <jmazane@amazon.com>
@@ -119,6 +119,7 @@ allprojects { | |||
} | |||
|
|||
repositories { | |||
mavenLocal() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Nit-pick] should we add this? I think we can remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added it explicitly because it is required for developing against ml-commons unpublished changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it is required for local development, but ideally we should not put this, as it can lead to errors with different people can different versions.
These should be added as a part of Development.md file that we need to enable mavenLocal(). But good for now.
Thanks, that makes sense. Make sure the error message is readable. Tell user we only support one query field. like this one
|
Signed-off-by: John Mazanec <jmazane@amazon.com>
@ylwu-amzn Makes sense. Just updated. |
Signed-off-by: John Mazanec <jmazane@amazon.com>
Description
Adds new query builder to plugin: NeuralQueryBuilder. Adds logic to parse the user provided parameters for the NeuralQueryBuilder as well as a few other smaller methods. Query text to dense vector logic will be added in a future PR
The query has the following structure:
Adds unit tests as well as test utility class
Issues Resolved
#14 (partially resolved)
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.