-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Supporting sparse semantic retrieval in neural search #333
Supporting sparse semantic retrieval in neural search #333
Conversation
Some integ tests fail because new sparse models in ml-commons are invoked, while the PR in ml-commons haven't been merged. |
We have finalized our code, it is ready for review now |
91f4d1f
to
79ce678
Compare
The force push aimed to fix DCO |
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zane-neo <zaniu@amazon.com>
Signed-off-by: zane-neo <zaniu@amazon.com>
Signed-off-by: zane-neo <zaniu@amazon.com>
Signed-off-by: xinyual <xinyual@amazon.com>
Signed-off-by: xinyual <xinyual@amazon.com>
Signed-off-by: xinyual <xinyual@amazon.com>
Signed-off-by: xinyual <xinyual@amazon.com>
Signed-off-by: zane-neo <zaniu@amazon.com>
Signed-off-by: zane-neo <zaniu@amazon.com>
@NonNull final List<String> inputText, | ||
@NonNull final ActionListener<List<Map<String, ?>>> listener | ||
) { | ||
retryableInferenceSentencesWithMapResult(modelId, inputText, 0, listener); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we want to do no retires?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are getting consistent with the existing inferenceSentences
method. @zane-neo Could you please help answer this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 0 here doesn't mean no retry, it's only a initial value, this value will be increased till max retry time(3), this can be optimized to decrease in the future to make it more intuitive though.
release-notes/opensearch-neural-search.release-notes-2.10.0.0.md
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/ml/MLCommonsClientAccessor.java
Outdated
Show resolved
Hide resolved
@@ -144,4 +175,20 @@ private List<List<Float>> buildVectorFromResponse(MLOutput mlOutput) { | |||
return vector; | |||
} | |||
|
|||
private List<Map<String, ?>> buildMapResultFromResponse(MLOutput mlOutput) { | |||
final ModelTensorOutput modelTensorOutput = (ModelTensorOutput) mlOutput; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this cast safe? Or should we check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as the above one. Maybe we can use consistent code between the 2 use case. And If they need to be fixed, we can create another PR to fix them.
neural-search/src/main/java/org/opensearch/neuralsearch/ml/MLCommonsClientAccessor.java
Line 136 in 8484be9
final ModelTensorOutput modelTensorOutput = (ModelTensorOutput) mlOutput; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can fix the other one in the future, but I want to understand how we know for certain this wont cause class cast on invalid input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zane-neo could you please help answer this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two inference method we created in neural-search(inferenceWithVectorResult & inferenceWithMapResult) are for text-embedding and sparse-embedding only, and they both return ModelTensorOutput. For later one in the future, it could be expanded to other use cases, e.g. remote inference, but in this case, it's still returning ModelTensorOutput. So it's safe to cast this output to ModelTensorOutput.
src/main/java/org/opensearch/neuralsearch/plugin/NeuralSearch.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/NLPProcessor.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/NLPProcessor.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/NLPProcessor.java
Outdated
Show resolved
Hide resolved
src/test/java/org/opensearch/neuralsearch/ml/MLCommonsClientAccessorTests.java
Outdated
Show resolved
Hide resolved
src/test/java/org/opensearch/neuralsearch/ml/MLCommonsClientAccessorTests.java
Outdated
Show resolved
Hide resolved
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Sure |
src/main/java/org/opensearch/neuralsearch/ml/MLCommonsClientAccessor.java
Outdated
Show resolved
Hide resolved
@@ -144,4 +175,20 @@ private List<List<Float>> buildVectorFromResponse(MLOutput mlOutput) { | |||
return vector; | |||
} | |||
|
|||
private List<Map<String, ?>> buildMapResultFromResponse(MLOutput mlOutput) { | |||
final ModelTensorOutput modelTensorOutput = (ModelTensorOutput) mlOutput; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can fix the other one in the future, but I want to understand how we know for certain this wont cause class cast on invalid input.
src/main/java/org/opensearch/neuralsearch/query/SparseEncodingQueryBuilder.java
Show resolved
Hide resolved
out.writeOptionalString(queryText); | ||
out.writeOptionalString(modelId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these 2 optional? In the fromXContent it looks like they require a value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. In original design there were 2 query forms and in final proposal we only keep this one. But I didn't change here. Will fix this.
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
CHANGELOG.md
Outdated
@@ -5,6 +5,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), | |||
|
|||
## [Unreleased 3.0](https://github.com/opensearch-project/neural-search/compare/2.x...HEAD) | |||
### Features | |||
Support sparse semantic retrieval by introducing `sparse_encoding` ingest processor and query builder ([#333](https://github.com/opensearch-project/neural-search/pull/333)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we typically have PR for one version, if that's for 2.11 it should be under 2.x
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will fix this.
runtimeOnly group: 'com.google.code.gson', name: 'gson', version: '2.10.1' | ||
runtimeOnly group: 'org.json', name: 'json', version: '20230227' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
confirming, I've seen same behavior while working on multimodal
Map<String, Object> sourceAndMetadataMap, | ||
Map<String, Object> treeRes | ||
public void doExecute( | ||
IngestDocument ingestDocument, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
arguments should be final
at least for public methods, that's the convention if I remember correctly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Furthermore, we can mark the two inheritor processor class final, since we can only have one behavior for one ingest processor
src/main/java/org/opensearch/neuralsearch/processor/factory/SparseEncodingProcessorFactory.java
Show resolved
Hide resolved
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Map<String, Float> result = new HashMap<>(); | ||
for (Map.Entry<?, ?> entry : ((Map<?, ?>) uncastedMap).entrySet()) { | ||
if (!String.class.isAssignableFrom(entry.getKey().getClass()) || !Number.class.isAssignableFrom(entry.getValue().getClass())) { | ||
throw new IllegalArgumentException("The expected inference result is a Map with String keys and " + " Float values."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we use "+" to concatenate strings, you can just use everything as one string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's the automatic tidy tool merge 2 lines. will remove the "+"
modelId(), | ||
List.of(queryText), | ||
ActionListener.wrap(mapResultList -> { | ||
queryTokensSetOnce.set(TokenWeightUtil.fetchListOfTokenWeightMap(mapResultList).get(0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this safe to assume that list will have at least one element without checking is it's not empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Will add the check at the util class
private static void validateForRewrite(String queryText, String modelId) { | ||
if (StringUtils.isBlank(queryText) || StringUtils.isBlank(modelId)) { | ||
throw new IllegalArgumentException( | ||
QUERY_TEXT_FIELD.getPreferredName() + " and " + MODEL_ID_FIELD.getPreferredName() + " cannot be null." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use String.format for this error message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, has rewritten all strings having more than 2 args to String.format
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
* sparse mapper field and query builder Signed-off-by: zhichao-aws <zhichaog@amazon.com> * fix typo Signed-off-by: zhichao-aws <zhichaog@amazon.com> * Add map result support in neural search for non text embedding models Signed-off-by: zane-neo <zaniu@amazon.com> * Fix compilation failure issue Signed-off-by: zane-neo <zaniu@amazon.com> * Add more UTs Signed-off-by: zane-neo <zaniu@amazon.com> * add sparse encoding processor Signed-off-by: xinyual <xinyual@amazon.com> * add sparse encoding processor Signed-off-by: xinyual <xinyual@amazon.com> * remove guava in gradle Signed-off-by: xinyual <xinyual@amazon.com> * modify access control Signed-off-by: xinyual <xinyual@amazon.com> * Add map result support in neural search for non text embedding models Signed-off-by: zane-neo <zaniu@amazon.com> * Fix compilation failure issue Signed-off-by: zane-neo <zaniu@amazon.com> * change output logic Signed-off-by: xinyual <xinyual@amazon.com> * create abstract Signed-off-by: xinyual <xinyual@amazon.com> * create abstract proccesor Signed-off-by: xinyual <xinyual@amazon.com> * add abstract class Signed-off-by: xinyual <xinyual@amazon.com> * remove duplicate code Signed-off-by: xinyual <xinyual@amazon.com> * remove duplicate code Signed-off-by: xinyual <xinyual@amazon.com> * remove dl process Signed-off-by: xinyual <xinyual@amazon.com> * move static to abstract class Signed-off-by: xinyual <xinyual@amazon.com> * update query rewrite logic Signed-off-by: zhichao-aws <zhichaog@amazon.com> * modify header Signed-off-by: zhichao-aws <zhichaog@amazon.com> * merge conflict Signed-off-by: xinyual <xinyual@amazon.com> * delete index mapper, change to rank_features Signed-off-by: zhichao-aws <zhichaog@amazon.com> * remove unused import Signed-off-by: zhichao-aws <zhichaog@amazon.com> * list return result Signed-off-by: zhichao-aws <zhichaog@amazon.com> * refactor type and listTypeNestedMapKey, tidy Signed-off-by: zhichao-aws <zhichaog@amazon.com> * forbid nested input. tidy. Signed-off-by: zhichao-aws <zhichaog@amazon.com> * tidy Signed-off-by: zhichao-aws <zhichaog@amazon.com> * enable nested Signed-off-by: zhichao-aws <zhichaog@amazon.com> * fix test Signed-off-by: zhichao-aws <zhichaog@amazon.com> * Add ut it to sparse encoding processor (#6) * fix original UT problem Signed-off-by: xinyual <xinyual@amazon.com> * add UT IT Signed-off-by: xinyual <xinyual@amazon.com> * add more UT Signed-off-by: xinyual <xinyual@amazon.com> * add more ut Signed-off-by: xinyual <xinyual@amazon.com> * fix typo error Signed-off-by: xinyual <xinyual@amazon.com> --------- Signed-off-by: xinyual <xinyual@amazon.com> * utils, tidy Signed-off-by: zhichao-aws <zhichaog@amazon.com> * rename to sparse_encoding query Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add validation and ut Signed-off-by: zhichao-aws <zhichaog@amazon.com> * sparse encoding query builder ut Signed-off-by: zhichao-aws <zhichaog@amazon.com> * rename Signed-off-by: zhichao-aws <zhichaog@amazon.com> * UT for utils Signed-off-by: zhichao-aws <zhichaog@amazon.com> * enrich sparse encoding IT mappings Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add it Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add it Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add integ test Signed-off-by: zhichao-aws <zhichaog@amazon.com> * rename resource file Signed-off-by: zhichao-aws <zhichaog@amazon.com> * tidy Signed-off-by: zhichao-aws <zhichaog@amazon.com> * remove BoundedLinearQuery and TokenScoreUpperBound Signed-off-by: zhichao-aws <zhichaog@amazon.com> * tidy Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add delta to loose the equal Signed-off-by: zhichao-aws <zhichaog@amazon.com> * move SparseEncodingQueryBuilder to upper level path Signed-off-by: zhichao-aws <zhichaog@amazon.com> * tidy Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add it Signed-off-by: zhichao-aws <zhichaog@amazon.com> * Update src/main/java/org/opensearch/neuralsearch/ml/MLCommonsClientAccessor.java Co-authored-by: zane-neo <zaniu@amazon.com> Signed-off-by: zhichao-aws <zhichaog@amazon.com> * Update src/main/java/org/opensearch/neuralsearch/util/TokenWeightUtil.java Co-authored-by: zane-neo <zaniu@amazon.com> Signed-off-by: zhichao-aws <zhichaog@amazon.com> * restore gradle.propeties Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add release notes Signed-off-by: zhichao-aws <zhichaog@amazon.com> * change field modifier to private for NLPProcessor Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add comments Signed-off-by: zhichao-aws <zhichaog@amazon.com> * use StringUtils to check Signed-off-by: zhichao-aws <zhichaog@amazon.com> * null check Signed-off-by: zhichao-aws <zhichaog@amazon.com> * modify changelog Signed-off-by: zhichao-aws <zhichaog@amazon.com> * nit Signed-off-by: zhichao-aws <zhichaog@amazon.com> * nit Signed-off-by: zhichao-aws <zhichaog@amazon.com> * remove query tokens from user interface Signed-off-by: zhichao-aws <zhichaog@amazon.com> * fix test Signed-off-by: zhichao-aws <zhichaog@amazon.com> * tidy Signed-off-by: zhichao-aws <zhichaog@amazon.com> * update function name Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add javadoc Signed-off-by: zhichao-aws <zhichaog@amazon.com> * remove debug log including inference result Signed-off-by: zhichao-aws <zhichaog@amazon.com> * make query text and model id required Signed-off-by: zhichao-aws <zhichaog@amazon.com> * minor changes based on comments Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add locale to String.format Signed-off-by: zhichao-aws <zhichaog@amazon.com> * update mock model url Signed-off-by: zhichao-aws <zhichaog@amazon.com> --------- Signed-off-by: zhichao-aws <zhichaog@amazon.com> Signed-off-by: zane-neo <zaniu@amazon.com> Signed-off-by: xinyual <xinyual@amazon.com> Co-authored-by: zane-neo <zaniu@amazon.com> Co-authored-by: xinyual <xinyual@amazon.com> (cherry picked from commit 7bef7a0)
…ject#333) * sparse mapper field and query builder Signed-off-by: zhichao-aws <zhichaog@amazon.com> * fix typo Signed-off-by: zhichao-aws <zhichaog@amazon.com> * Add map result support in neural search for non text embedding models Signed-off-by: zane-neo <zaniu@amazon.com> * Fix compilation failure issue Signed-off-by: zane-neo <zaniu@amazon.com> * Add more UTs Signed-off-by: zane-neo <zaniu@amazon.com> * add sparse encoding processor Signed-off-by: xinyual <xinyual@amazon.com> * add sparse encoding processor Signed-off-by: xinyual <xinyual@amazon.com> * remove guava in gradle Signed-off-by: xinyual <xinyual@amazon.com> * modify access control Signed-off-by: xinyual <xinyual@amazon.com> * Add map result support in neural search for non text embedding models Signed-off-by: zane-neo <zaniu@amazon.com> * Fix compilation failure issue Signed-off-by: zane-neo <zaniu@amazon.com> * change output logic Signed-off-by: xinyual <xinyual@amazon.com> * create abstract Signed-off-by: xinyual <xinyual@amazon.com> * create abstract proccesor Signed-off-by: xinyual <xinyual@amazon.com> * add abstract class Signed-off-by: xinyual <xinyual@amazon.com> * remove duplicate code Signed-off-by: xinyual <xinyual@amazon.com> * remove duplicate code Signed-off-by: xinyual <xinyual@amazon.com> * remove dl process Signed-off-by: xinyual <xinyual@amazon.com> * move static to abstract class Signed-off-by: xinyual <xinyual@amazon.com> * update query rewrite logic Signed-off-by: zhichao-aws <zhichaog@amazon.com> * modify header Signed-off-by: zhichao-aws <zhichaog@amazon.com> * merge conflict Signed-off-by: xinyual <xinyual@amazon.com> * delete index mapper, change to rank_features Signed-off-by: zhichao-aws <zhichaog@amazon.com> * remove unused import Signed-off-by: zhichao-aws <zhichaog@amazon.com> * list return result Signed-off-by: zhichao-aws <zhichaog@amazon.com> * refactor type and listTypeNestedMapKey, tidy Signed-off-by: zhichao-aws <zhichaog@amazon.com> * forbid nested input. tidy. Signed-off-by: zhichao-aws <zhichaog@amazon.com> * tidy Signed-off-by: zhichao-aws <zhichaog@amazon.com> * enable nested Signed-off-by: zhichao-aws <zhichaog@amazon.com> * fix test Signed-off-by: zhichao-aws <zhichaog@amazon.com> * Add ut it to sparse encoding processor (#6) * fix original UT problem Signed-off-by: xinyual <xinyual@amazon.com> * add UT IT Signed-off-by: xinyual <xinyual@amazon.com> * add more UT Signed-off-by: xinyual <xinyual@amazon.com> * add more ut Signed-off-by: xinyual <xinyual@amazon.com> * fix typo error Signed-off-by: xinyual <xinyual@amazon.com> --------- Signed-off-by: xinyual <xinyual@amazon.com> * utils, tidy Signed-off-by: zhichao-aws <zhichaog@amazon.com> * rename to sparse_encoding query Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add validation and ut Signed-off-by: zhichao-aws <zhichaog@amazon.com> * sparse encoding query builder ut Signed-off-by: zhichao-aws <zhichaog@amazon.com> * rename Signed-off-by: zhichao-aws <zhichaog@amazon.com> * UT for utils Signed-off-by: zhichao-aws <zhichaog@amazon.com> * enrich sparse encoding IT mappings Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add it Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add it Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add integ test Signed-off-by: zhichao-aws <zhichaog@amazon.com> * rename resource file Signed-off-by: zhichao-aws <zhichaog@amazon.com> * tidy Signed-off-by: zhichao-aws <zhichaog@amazon.com> * remove BoundedLinearQuery and TokenScoreUpperBound Signed-off-by: zhichao-aws <zhichaog@amazon.com> * tidy Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add delta to loose the equal Signed-off-by: zhichao-aws <zhichaog@amazon.com> * move SparseEncodingQueryBuilder to upper level path Signed-off-by: zhichao-aws <zhichaog@amazon.com> * tidy Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add it Signed-off-by: zhichao-aws <zhichaog@amazon.com> * Update src/main/java/org/opensearch/neuralsearch/ml/MLCommonsClientAccessor.java Co-authored-by: zane-neo <zaniu@amazon.com> Signed-off-by: zhichao-aws <zhichaog@amazon.com> * Update src/main/java/org/opensearch/neuralsearch/util/TokenWeightUtil.java Co-authored-by: zane-neo <zaniu@amazon.com> Signed-off-by: zhichao-aws <zhichaog@amazon.com> * restore gradle.propeties Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add release notes Signed-off-by: zhichao-aws <zhichaog@amazon.com> * change field modifier to private for NLPProcessor Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add comments Signed-off-by: zhichao-aws <zhichaog@amazon.com> * use StringUtils to check Signed-off-by: zhichao-aws <zhichaog@amazon.com> * null check Signed-off-by: zhichao-aws <zhichaog@amazon.com> * modify changelog Signed-off-by: zhichao-aws <zhichaog@amazon.com> * nit Signed-off-by: zhichao-aws <zhichaog@amazon.com> * nit Signed-off-by: zhichao-aws <zhichaog@amazon.com> * remove query tokens from user interface Signed-off-by: zhichao-aws <zhichaog@amazon.com> * fix test Signed-off-by: zhichao-aws <zhichaog@amazon.com> * tidy Signed-off-by: zhichao-aws <zhichaog@amazon.com> * update function name Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add javadoc Signed-off-by: zhichao-aws <zhichaog@amazon.com> * remove debug log including inference result Signed-off-by: zhichao-aws <zhichaog@amazon.com> * make query text and model id required Signed-off-by: zhichao-aws <zhichaog@amazon.com> * minor changes based on comments Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add locale to String.format Signed-off-by: zhichao-aws <zhichaog@amazon.com> * update mock model url Signed-off-by: zhichao-aws <zhichaog@amazon.com> --------- Signed-off-by: zhichao-aws <zhichaog@amazon.com> Signed-off-by: zane-neo <zaniu@amazon.com> Signed-off-by: xinyual <xinyual@amazon.com> Co-authored-by: zane-neo <zaniu@amazon.com> Co-authored-by: xinyual <xinyual@amazon.com>
…343) * Supporting sparse semantic retrieval in neural search (#333) * sparse mapper field and query builder Signed-off-by: zhichao-aws <zhichaog@amazon.com> * fix typo Signed-off-by: zhichao-aws <zhichaog@amazon.com> * Add map result support in neural search for non text embedding models Signed-off-by: zane-neo <zaniu@amazon.com> * Fix compilation failure issue Signed-off-by: zane-neo <zaniu@amazon.com> * Add more UTs Signed-off-by: zane-neo <zaniu@amazon.com> * add sparse encoding processor Signed-off-by: xinyual <xinyual@amazon.com> * add sparse encoding processor Signed-off-by: xinyual <xinyual@amazon.com> * remove guava in gradle Signed-off-by: xinyual <xinyual@amazon.com> * modify access control Signed-off-by: xinyual <xinyual@amazon.com> * Add map result support in neural search for non text embedding models Signed-off-by: zane-neo <zaniu@amazon.com> * Fix compilation failure issue Signed-off-by: zane-neo <zaniu@amazon.com> * change output logic Signed-off-by: xinyual <xinyual@amazon.com> * create abstract Signed-off-by: xinyual <xinyual@amazon.com> * create abstract proccesor Signed-off-by: xinyual <xinyual@amazon.com> * add abstract class Signed-off-by: xinyual <xinyual@amazon.com> * remove duplicate code Signed-off-by: xinyual <xinyual@amazon.com> * remove duplicate code Signed-off-by: xinyual <xinyual@amazon.com> * remove dl process Signed-off-by: xinyual <xinyual@amazon.com> * move static to abstract class Signed-off-by: xinyual <xinyual@amazon.com> * update query rewrite logic Signed-off-by: zhichao-aws <zhichaog@amazon.com> * modify header Signed-off-by: zhichao-aws <zhichaog@amazon.com> * merge conflict Signed-off-by: xinyual <xinyual@amazon.com> * delete index mapper, change to rank_features Signed-off-by: zhichao-aws <zhichaog@amazon.com> * remove unused import Signed-off-by: zhichao-aws <zhichaog@amazon.com> * list return result Signed-off-by: zhichao-aws <zhichaog@amazon.com> * refactor type and listTypeNestedMapKey, tidy Signed-off-by: zhichao-aws <zhichaog@amazon.com> * forbid nested input. tidy. Signed-off-by: zhichao-aws <zhichaog@amazon.com> * tidy Signed-off-by: zhichao-aws <zhichaog@amazon.com> * enable nested Signed-off-by: zhichao-aws <zhichaog@amazon.com> * fix test Signed-off-by: zhichao-aws <zhichaog@amazon.com> * Add ut it to sparse encoding processor (#6) * fix original UT problem Signed-off-by: xinyual <xinyual@amazon.com> * add UT IT Signed-off-by: xinyual <xinyual@amazon.com> * add more UT Signed-off-by: xinyual <xinyual@amazon.com> * add more ut Signed-off-by: xinyual <xinyual@amazon.com> * fix typo error Signed-off-by: xinyual <xinyual@amazon.com> --------- Signed-off-by: xinyual <xinyual@amazon.com> * utils, tidy Signed-off-by: zhichao-aws <zhichaog@amazon.com> * rename to sparse_encoding query Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add validation and ut Signed-off-by: zhichao-aws <zhichaog@amazon.com> * sparse encoding query builder ut Signed-off-by: zhichao-aws <zhichaog@amazon.com> * rename Signed-off-by: zhichao-aws <zhichaog@amazon.com> * UT for utils Signed-off-by: zhichao-aws <zhichaog@amazon.com> * enrich sparse encoding IT mappings Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add it Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add it Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add integ test Signed-off-by: zhichao-aws <zhichaog@amazon.com> * rename resource file Signed-off-by: zhichao-aws <zhichaog@amazon.com> * tidy Signed-off-by: zhichao-aws <zhichaog@amazon.com> * remove BoundedLinearQuery and TokenScoreUpperBound Signed-off-by: zhichao-aws <zhichaog@amazon.com> * tidy Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add delta to loose the equal Signed-off-by: zhichao-aws <zhichaog@amazon.com> * move SparseEncodingQueryBuilder to upper level path Signed-off-by: zhichao-aws <zhichaog@amazon.com> * tidy Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add it Signed-off-by: zhichao-aws <zhichaog@amazon.com> * Update src/main/java/org/opensearch/neuralsearch/ml/MLCommonsClientAccessor.java Co-authored-by: zane-neo <zaniu@amazon.com> Signed-off-by: zhichao-aws <zhichaog@amazon.com> * Update src/main/java/org/opensearch/neuralsearch/util/TokenWeightUtil.java Co-authored-by: zane-neo <zaniu@amazon.com> Signed-off-by: zhichao-aws <zhichaog@amazon.com> * restore gradle.propeties Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add release notes Signed-off-by: zhichao-aws <zhichaog@amazon.com> * change field modifier to private for NLPProcessor Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add comments Signed-off-by: zhichao-aws <zhichaog@amazon.com> * use StringUtils to check Signed-off-by: zhichao-aws <zhichaog@amazon.com> * null check Signed-off-by: zhichao-aws <zhichaog@amazon.com> * modify changelog Signed-off-by: zhichao-aws <zhichaog@amazon.com> * nit Signed-off-by: zhichao-aws <zhichaog@amazon.com> * nit Signed-off-by: zhichao-aws <zhichaog@amazon.com> * remove query tokens from user interface Signed-off-by: zhichao-aws <zhichaog@amazon.com> * fix test Signed-off-by: zhichao-aws <zhichaog@amazon.com> * tidy Signed-off-by: zhichao-aws <zhichaog@amazon.com> * update function name Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add javadoc Signed-off-by: zhichao-aws <zhichaog@amazon.com> * remove debug log including inference result Signed-off-by: zhichao-aws <zhichaog@amazon.com> * make query text and model id required Signed-off-by: zhichao-aws <zhichaog@amazon.com> * minor changes based on comments Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add locale to String.format Signed-off-by: zhichao-aws <zhichaog@amazon.com> * update mock model url Signed-off-by: zhichao-aws <zhichaog@amazon.com> --------- Signed-off-by: zhichao-aws <zhichaog@amazon.com> Signed-off-by: zane-neo <zaniu@amazon.com> Signed-off-by: xinyual <xinyual@amazon.com> Co-authored-by: zane-neo <zaniu@amazon.com> Co-authored-by: xinyual <xinyual@amazon.com> (cherry picked from commit 7bef7a0) * Fix the compile error in [Backport/backport 333 to 2.x] (#344) * fix apache http version Signed-off-by: zhichao-aws <zhichaog@amazon.com> * add import Signed-off-by: zhichao-aws <zhichaog@amazon.com> --------- Signed-off-by: zhichao-aws <zhichaog@amazon.com> --------- Signed-off-by: zhichao-aws <zhichaog@amazon.com> Co-authored-by: zhichao-aws <zhichaog@amazon.com>
Description
Support using sparse encoding model e.g. SPLADE in opensearch. We'll introduce a new sparse encoding doc ingest processor, and a new sparse encoding query clause.
related PR: opensearch-project/ml-commons#1301
Issues Resolved
#230
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.