Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting sparse semantic retrieval in neural search #333

Merged

Conversation

zhichao-aws
Copy link
Member

@zhichao-aws zhichao-aws commented Sep 22, 2023

Description

Support using sparse encoding model e.g. SPLADE in opensearch. We'll introduce a new sparse encoding doc ingest processor, and a new sparse encoding query clause.

related PR: opensearch-project/ml-commons#1301

Issues Resolved

#230

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@zhichao-aws
Copy link
Member Author

Some integ tests fail because new sparse models in ml-commons are invoked, while the PR in ml-commons haven't been merged.

@zhichao-aws
Copy link
Member Author

We have finalized our code, it is ready for review now

@zhichao-aws
Copy link
Member Author

The force push aimed to fix DCO

zhichao-aws and others added 11 commits September 22, 2023 15:09
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zane-neo <zaniu@amazon.com>
Signed-off-by: zane-neo <zaniu@amazon.com>
Signed-off-by: xinyual <xinyual@amazon.com>
Signed-off-by: xinyual <xinyual@amazon.com>
Signed-off-by: xinyual <xinyual@amazon.com>
Signed-off-by: xinyual <xinyual@amazon.com>
Signed-off-by: zane-neo <zaniu@amazon.com>
@NonNull final List<String> inputText,
@NonNull final ActionListener<List<Map<String, ?>>> listener
) {
retryableInferenceSentencesWithMapResult(modelId, inputText, 0, listener);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we want to do no retires?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are getting consistent with the existing inferenceSentences method. @zane-neo Could you please help answer this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 0 here doesn't mean no retry, it's only a initial value, this value will be increased till max retry time(3), this can be optimized to decrease in the future to make it more intuitive though.

@@ -144,4 +175,20 @@ private List<List<Float>> buildVectorFromResponse(MLOutput mlOutput) {
return vector;
}

private List<Map<String, ?>> buildMapResultFromResponse(MLOutput mlOutput) {
final ModelTensorOutput modelTensorOutput = (ModelTensorOutput) mlOutput;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this cast safe? Or should we check?

Copy link
Member Author

@zhichao-aws zhichao-aws Sep 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the above one. Maybe we can use consistent code between the 2 use case. And If they need to be fixed, we can create another PR to fix them.

final ModelTensorOutput modelTensorOutput = (ModelTensorOutput) mlOutput;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can fix the other one in the future, but I want to understand how we know for certain this wont cause class cast on invalid input.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zane-neo could you please help answer this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two inference method we created in neural-search(inferenceWithVectorResult & inferenceWithMapResult) are for text-embedding and sparse-embedding only, and they both return ModelTensorOutput. For later one in the future, it could be expanded to other use cases, e.g. remote inference, but in this case, it's still returning ModelTensorOutput. So it's safe to cast this output to ModelTensorOutput.

Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
@zhichao-aws
Copy link
Member Author

@zhichao-aws can we add documentation on all the public classes and functions.

Sure

@@ -144,4 +175,20 @@ private List<List<Float>> buildVectorFromResponse(MLOutput mlOutput) {
return vector;
}

private List<Map<String, ?>> buildMapResultFromResponse(MLOutput mlOutput) {
final ModelTensorOutput modelTensorOutput = (ModelTensorOutput) mlOutput;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can fix the other one in the future, but I want to understand how we know for certain this wont cause class cast on invalid input.

Comment on lines 91 to 92
out.writeOptionalString(queryText);
out.writeOptionalString(modelId);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these 2 optional? In the fromXContent it looks like they require a value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. In original design there were 2 query forms and in final proposal we only keep this one. But I didn't change here. Will fix this.

Signed-off-by: zhichao-aws <zhichaog@amazon.com>
CHANGELOG.md Outdated
@@ -5,6 +5,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),

## [Unreleased 3.0](https://github.com/opensearch-project/neural-search/compare/2.x...HEAD)
### Features
Support sparse semantic retrieval by introducing `sparse_encoding` ingest processor and query builder ([#333](https://github.com/opensearch-project/neural-search/pull/333))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we typically have PR for one version, if that's for 2.11 it should be under 2.x

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will fix this.

Comment on lines +154 to +155
runtimeOnly group: 'com.google.code.gson', name: 'gson', version: '2.10.1'
runtimeOnly group: 'org.json', name: 'json', version: '20230227'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confirming, I've seen same behavior while working on multimodal

Map<String, Object> sourceAndMetadataMap,
Map<String, Object> treeRes
public void doExecute(
IngestDocument ingestDocument,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arguments should be final at least for public methods, that's the convention if I remember correctly

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Furthermore, we can mark the two inheritor processor class final, since we can only have one behavior for one ingest processor

Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Map<String, Float> result = new HashMap<>();
for (Map.Entry<?, ?> entry : ((Map<?, ?>) uncastedMap).entrySet()) {
if (!String.class.isAssignableFrom(entry.getKey().getClass()) || !Number.class.isAssignableFrom(entry.getValue().getClass())) {
throw new IllegalArgumentException("The expected inference result is a Map with String keys and " + " Float values.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we use "+" to concatenate strings, you can just use everything as one string

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's the automatic tidy tool merge 2 lines. will remove the "+"

modelId(),
List.of(queryText),
ActionListener.wrap(mapResultList -> {
queryTokensSetOnce.set(TokenWeightUtil.fetchListOfTokenWeightMap(mapResultList).get(0));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this safe to assume that list will have at least one element without checking is it's not empty?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Will add the check at the util class

private static void validateForRewrite(String queryText, String modelId) {
if (StringUtils.isBlank(queryText) || StringUtils.isBlank(modelId)) {
throw new IllegalArgumentException(
QUERY_TEXT_FIELD.getPreferredName() + " and " + MODEL_ID_FIELD.getPreferredName() + " cannot be null."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use String.format for this error message

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, has rewritten all strings having more than 2 args to String.format

Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>
@zane-neo zane-neo merged commit 7bef7a0 into opensearch-project:main Sep 27, 2023
9 of 13 checks passed
@zane-neo zane-neo added the backport 2.x Label will add auto workflow to backport PR to 2.x branch label Sep 27, 2023
opensearch-trigger-bot bot pushed a commit that referenced this pull request Sep 27, 2023
* sparse mapper field and query builder

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* fix typo

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* Add map result support in neural search for non text embedding models

Signed-off-by: zane-neo <zaniu@amazon.com>

* Fix compilation failure issue

Signed-off-by: zane-neo <zaniu@amazon.com>

* Add more UTs

Signed-off-by: zane-neo <zaniu@amazon.com>

* add sparse encoding processor

Signed-off-by: xinyual <xinyual@amazon.com>

* add sparse encoding processor

Signed-off-by: xinyual <xinyual@amazon.com>

* remove guava in gradle

Signed-off-by: xinyual <xinyual@amazon.com>

* modify access control

Signed-off-by: xinyual <xinyual@amazon.com>

* Add map result support in neural search for non text embedding models

Signed-off-by: zane-neo <zaniu@amazon.com>

* Fix compilation failure issue

Signed-off-by: zane-neo <zaniu@amazon.com>

* change output logic

Signed-off-by: xinyual <xinyual@amazon.com>

* create abstract

Signed-off-by: xinyual <xinyual@amazon.com>

* create abstract proccesor

Signed-off-by: xinyual <xinyual@amazon.com>

* add abstract class

Signed-off-by: xinyual <xinyual@amazon.com>

* remove duplicate code

Signed-off-by: xinyual <xinyual@amazon.com>

* remove duplicate code

Signed-off-by: xinyual <xinyual@amazon.com>

* remove dl process

Signed-off-by: xinyual <xinyual@amazon.com>

* move static to abstract class

Signed-off-by: xinyual <xinyual@amazon.com>

* update query rewrite logic

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* modify header

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* merge conflict

Signed-off-by: xinyual <xinyual@amazon.com>

* delete index mapper, change to rank_features

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* remove unused import

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* list return result

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* refactor type and listTypeNestedMapKey, tidy

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* forbid nested input. tidy.

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* tidy

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* enable nested

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* fix test

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* Add ut it to sparse encoding processor (#6)

* fix original UT problem

Signed-off-by: xinyual <xinyual@amazon.com>

* add UT IT

Signed-off-by: xinyual <xinyual@amazon.com>

* add more UT

Signed-off-by: xinyual <xinyual@amazon.com>

* add more ut

Signed-off-by: xinyual <xinyual@amazon.com>

* fix typo error

Signed-off-by: xinyual <xinyual@amazon.com>

---------

Signed-off-by: xinyual <xinyual@amazon.com>

* utils, tidy

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* rename to sparse_encoding query

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* add validation and ut

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* sparse encoding query builder ut

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* rename

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* UT for utils

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* enrich sparse encoding IT mappings

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* add it

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* add it

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* add integ test

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* rename resource file

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* tidy

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* remove BoundedLinearQuery and TokenScoreUpperBound

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* tidy

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* add delta to loose the equal

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* move SparseEncodingQueryBuilder to upper level path

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* tidy

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* add it

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* Update src/main/java/org/opensearch/neuralsearch/ml/MLCommonsClientAccessor.java

Co-authored-by: zane-neo <zaniu@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* Update src/main/java/org/opensearch/neuralsearch/util/TokenWeightUtil.java

Co-authored-by: zane-neo <zaniu@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* restore gradle.propeties

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* add release notes

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* change field modifier to private for NLPProcessor

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* add comments

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* use StringUtils to check

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* null check

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* modify changelog

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* nit

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* nit

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* remove query tokens from user interface

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* fix test

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* tidy

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* update function name

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* add javadoc

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* remove debug log including inference result

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* make query text and model id required

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* minor changes based on comments

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* add locale to String.format

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* update mock model url

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

---------

Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zane-neo <zaniu@amazon.com>
Signed-off-by: xinyual <xinyual@amazon.com>
Co-authored-by: zane-neo <zaniu@amazon.com>
Co-authored-by: xinyual <xinyual@amazon.com>
(cherry picked from commit 7bef7a0)
zhichao-aws added a commit to zhichao-aws/neural-search that referenced this pull request Sep 27, 2023
…ject#333)

* sparse mapper field and query builder

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* fix typo

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* Add map result support in neural search for non text embedding models

Signed-off-by: zane-neo <zaniu@amazon.com>

* Fix compilation failure issue

Signed-off-by: zane-neo <zaniu@amazon.com>

* Add more UTs

Signed-off-by: zane-neo <zaniu@amazon.com>

* add sparse encoding processor

Signed-off-by: xinyual <xinyual@amazon.com>

* add sparse encoding processor

Signed-off-by: xinyual <xinyual@amazon.com>

* remove guava in gradle

Signed-off-by: xinyual <xinyual@amazon.com>

* modify access control

Signed-off-by: xinyual <xinyual@amazon.com>

* Add map result support in neural search for non text embedding models

Signed-off-by: zane-neo <zaniu@amazon.com>

* Fix compilation failure issue

Signed-off-by: zane-neo <zaniu@amazon.com>

* change output logic

Signed-off-by: xinyual <xinyual@amazon.com>

* create abstract

Signed-off-by: xinyual <xinyual@amazon.com>

* create abstract proccesor

Signed-off-by: xinyual <xinyual@amazon.com>

* add abstract class

Signed-off-by: xinyual <xinyual@amazon.com>

* remove duplicate code

Signed-off-by: xinyual <xinyual@amazon.com>

* remove duplicate code

Signed-off-by: xinyual <xinyual@amazon.com>

* remove dl process

Signed-off-by: xinyual <xinyual@amazon.com>

* move static to abstract class

Signed-off-by: xinyual <xinyual@amazon.com>

* update query rewrite logic

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* modify header

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* merge conflict

Signed-off-by: xinyual <xinyual@amazon.com>

* delete index mapper, change to rank_features

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* remove unused import

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* list return result

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* refactor type and listTypeNestedMapKey, tidy

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* forbid nested input. tidy.

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* tidy

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* enable nested

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* fix test

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* Add ut it to sparse encoding processor (#6)

* fix original UT problem

Signed-off-by: xinyual <xinyual@amazon.com>

* add UT IT

Signed-off-by: xinyual <xinyual@amazon.com>

* add more UT

Signed-off-by: xinyual <xinyual@amazon.com>

* add more ut

Signed-off-by: xinyual <xinyual@amazon.com>

* fix typo error

Signed-off-by: xinyual <xinyual@amazon.com>

---------

Signed-off-by: xinyual <xinyual@amazon.com>

* utils, tidy

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* rename to sparse_encoding query

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* add validation and ut

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* sparse encoding query builder ut

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* rename

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* UT for utils

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* enrich sparse encoding IT mappings

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* add it

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* add it

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* add integ test

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* rename resource file

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* tidy

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* remove BoundedLinearQuery and TokenScoreUpperBound

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* tidy

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* add delta to loose the equal

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* move SparseEncodingQueryBuilder to upper level path

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* tidy

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* add it

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* Update src/main/java/org/opensearch/neuralsearch/ml/MLCommonsClientAccessor.java

Co-authored-by: zane-neo <zaniu@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* Update src/main/java/org/opensearch/neuralsearch/util/TokenWeightUtil.java

Co-authored-by: zane-neo <zaniu@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* restore gradle.propeties

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* add release notes

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* change field modifier to private for NLPProcessor

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* add comments

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* use StringUtils to check

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* null check

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* modify changelog

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* nit

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* nit

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* remove query tokens from user interface

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* fix test

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* tidy

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* update function name

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* add javadoc

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* remove debug log including inference result

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* make query text and model id required

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* minor changes based on comments

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* add locale to String.format

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* update mock model url

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

---------

Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zane-neo <zaniu@amazon.com>
Signed-off-by: xinyual <xinyual@amazon.com>
Co-authored-by: zane-neo <zaniu@amazon.com>
Co-authored-by: xinyual <xinyual@amazon.com>
zane-neo pushed a commit that referenced this pull request Sep 27, 2023
…343)

* Supporting sparse semantic retrieval in neural search (#333)

* sparse mapper field and query builder

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* fix typo

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* Add map result support in neural search for non text embedding models

Signed-off-by: zane-neo <zaniu@amazon.com>

* Fix compilation failure issue

Signed-off-by: zane-neo <zaniu@amazon.com>

* Add more UTs

Signed-off-by: zane-neo <zaniu@amazon.com>

* add sparse encoding processor

Signed-off-by: xinyual <xinyual@amazon.com>

* add sparse encoding processor

Signed-off-by: xinyual <xinyual@amazon.com>

* remove guava in gradle

Signed-off-by: xinyual <xinyual@amazon.com>

* modify access control

Signed-off-by: xinyual <xinyual@amazon.com>

* Add map result support in neural search for non text embedding models

Signed-off-by: zane-neo <zaniu@amazon.com>

* Fix compilation failure issue

Signed-off-by: zane-neo <zaniu@amazon.com>

* change output logic

Signed-off-by: xinyual <xinyual@amazon.com>

* create abstract

Signed-off-by: xinyual <xinyual@amazon.com>

* create abstract proccesor

Signed-off-by: xinyual <xinyual@amazon.com>

* add abstract class

Signed-off-by: xinyual <xinyual@amazon.com>

* remove duplicate code

Signed-off-by: xinyual <xinyual@amazon.com>

* remove duplicate code

Signed-off-by: xinyual <xinyual@amazon.com>

* remove dl process

Signed-off-by: xinyual <xinyual@amazon.com>

* move static to abstract class

Signed-off-by: xinyual <xinyual@amazon.com>

* update query rewrite logic

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* modify header

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* merge conflict

Signed-off-by: xinyual <xinyual@amazon.com>

* delete index mapper, change to rank_features

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* remove unused import

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* list return result

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* refactor type and listTypeNestedMapKey, tidy

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* forbid nested input. tidy.

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* tidy

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* enable nested

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* fix test

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* Add ut it to sparse encoding processor (#6)

* fix original UT problem

Signed-off-by: xinyual <xinyual@amazon.com>

* add UT IT

Signed-off-by: xinyual <xinyual@amazon.com>

* add more UT

Signed-off-by: xinyual <xinyual@amazon.com>

* add more ut

Signed-off-by: xinyual <xinyual@amazon.com>

* fix typo error

Signed-off-by: xinyual <xinyual@amazon.com>

---------

Signed-off-by: xinyual <xinyual@amazon.com>

* utils, tidy

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* rename to sparse_encoding query

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* add validation and ut

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* sparse encoding query builder ut

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* rename

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* UT for utils

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* enrich sparse encoding IT mappings

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* add it

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* add it

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* add integ test

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* rename resource file

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* tidy

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* remove BoundedLinearQuery and TokenScoreUpperBound

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* tidy

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* add delta to loose the equal

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* move SparseEncodingQueryBuilder to upper level path

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* tidy

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* add it

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* Update src/main/java/org/opensearch/neuralsearch/ml/MLCommonsClientAccessor.java

Co-authored-by: zane-neo <zaniu@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* Update src/main/java/org/opensearch/neuralsearch/util/TokenWeightUtil.java

Co-authored-by: zane-neo <zaniu@amazon.com>
Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* restore gradle.propeties

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* add release notes

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* change field modifier to private for NLPProcessor

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* add comments

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* use StringUtils to check

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* null check

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* modify changelog

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* nit

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* nit

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* remove query tokens from user interface

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* fix test

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* tidy

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* update function name

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* add javadoc

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* remove debug log including inference result

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* make query text and model id required

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* minor changes based on comments

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* add locale to String.format

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* update mock model url

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

---------

Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Signed-off-by: zane-neo <zaniu@amazon.com>
Signed-off-by: xinyual <xinyual@amazon.com>
Co-authored-by: zane-neo <zaniu@amazon.com>
Co-authored-by: xinyual <xinyual@amazon.com>
(cherry picked from commit 7bef7a0)

* Fix the compile error in [Backport/backport 333 to 2.x] (#344)

* fix apache http version

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

* add import

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

---------

Signed-off-by: zhichao-aws <zhichaog@amazon.com>

---------

Signed-off-by: zhichao-aws <zhichaog@amazon.com>
Co-authored-by: zhichao-aws <zhichaog@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Label will add auto workflow to backport PR to 2.x branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants