Fix user defined preprocess function missing prediction issue #2418

zane-neo · 2024-05-08T14:11:43Z

Description

When users use self defined preprocess scripts to process inputs, there'll be missing predictions for all inputs in the list except the first one.

Issues Resolved

#2417

Check List

New functionality includes testing.
- All tests pass
New functionality has been documented.
- New functionality has javadoc added
Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: zane-neo <zaniu@amazon.com>

ylwu-amzn · 2024-05-09T00:12:27Z

...rithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/RemoteConnectorExecutor.java

+            String preProcessFunction = predictAction.get().getPreProcessFunction();
+            if (preProcessFunction != null && !MLPreProcessFunction.contains(preProcessFunction)) {
+                // user defined preprocess script, this case, the chunk size is always equals to text docs length.
+                return Tuple.tuple(textDocsLength, 1);


Is this understanding correct ?

textDocsLength means how many chunks

1 means step or chunk size ?

If correct, why hard code the chunk size as 1 ? This issue #2417 is an example, it's not the only case. For example some user may process two documents in one chunk.

https://github.com/opensearch-project/ml-commons/blob/2.x/docs/remote_inference_blueprints/bedrock_connector_titan_multimodal_embedding_blueprint.md

textDocsLength is chunks, 1 means step. When user process two documents in one chunk, user has to specify the input_docs_processed_step_size, in this case, this won't enter this branch. Multi-modal is a case that user needs to handle two documents in one chunk, I've tested this case and it works well.

Thanks for testing multi-modal case.
It's possible that step size > 1 and no input_docs_processed_step_size. For example, before async http client, user can confiture pre_process function to read three documents and construct "inputs": [doc1, doc2, doc3], when response comes back, will move to next 3 docs

This is a rare case, and we have workaround which is to add input_docs_processed_step_size configuration.

Thanks, based on our discussion, this sounds good

* Fix user defined preprocess function missing prediction issue Signed-off-by: zane-neo <zaniu@amazon.com> * Add validation to predictAction in connector Signed-off-by: zane-neo <zaniu@amazon.com> * Add check to multi-modal non image case Signed-off-by: zane-neo <zaniu@amazon.com> * add UTs Signed-off-by: zane-neo <zaniu@amazon.com> * format code Signed-off-by: zane-neo <zaniu@amazon.com> --------- Signed-off-by: zane-neo <zaniu@amazon.com> (cherry picked from commit 89f23d2)

…#2426) * Fix user defined preprocess function missing prediction issue Signed-off-by: zane-neo <zaniu@amazon.com> * Add validation to predictAction in connector Signed-off-by: zane-neo <zaniu@amazon.com> * Add check to multi-modal non image case Signed-off-by: zane-neo <zaniu@amazon.com> * add UTs Signed-off-by: zane-neo <zaniu@amazon.com> * format code Signed-off-by: zane-neo <zaniu@amazon.com> --------- Signed-off-by: zane-neo <zaniu@amazon.com> (cherry picked from commit 89f23d2) Co-authored-by: zane-neo <zaniu@amazon.com>

…#2427) * Fix user defined preprocess function missing prediction issue Signed-off-by: zane-neo <zaniu@amazon.com> * Add validation to predictAction in connector Signed-off-by: zane-neo <zaniu@amazon.com> * Add check to multi-modal non image case Signed-off-by: zane-neo <zaniu@amazon.com> * add UTs Signed-off-by: zane-neo <zaniu@amazon.com> * format code Signed-off-by: zane-neo <zaniu@amazon.com> --------- Signed-off-by: zane-neo <zaniu@amazon.com> (cherry picked from commit 89f23d2) Co-authored-by: zane-neo <zaniu@amazon.com>

…arch-project#2418) (opensearch-project#2426) * Fix user defined preprocess function missing prediction issue Signed-off-by: zane-neo <zaniu@amazon.com> * Add validation to predictAction in connector Signed-off-by: zane-neo <zaniu@amazon.com> * Add check to multi-modal non image case Signed-off-by: zane-neo <zaniu@amazon.com> * add UTs Signed-off-by: zane-neo <zaniu@amazon.com> * format code Signed-off-by: zane-neo <zaniu@amazon.com> --------- Signed-off-by: zane-neo <zaniu@amazon.com> (cherry picked from commit 89f23d2) Co-authored-by: zane-neo <zaniu@amazon.com>

Fix user defined preprocess function missing prediction issue

b6fa75f

Signed-off-by: zane-neo <zaniu@amazon.com>

zane-neo requested review from b4sjoo, dhrubo-os, jngz-es, model-collapse, rbhavna, ylwu-amzn, Zhangxunmt, austintlee, HenryL27, sam-herman and xinyual as code owners May 8, 2024 14:11

zane-neo had a problem deploying to ml-commons-cicd-env May 8, 2024 14:11 — with GitHub Actions Failure

zane-neo temporarily deployed to ml-commons-cicd-env May 8, 2024 14:11 — with GitHub Actions Inactive

zane-neo had a problem deploying to ml-commons-cicd-env May 8, 2024 14:12 — with GitHub Actions Error

zane-neo had a problem deploying to ml-commons-cicd-env May 8, 2024 14:12 — with GitHub Actions Failure

zane-neo had a problem deploying to ml-commons-cicd-env May 8, 2024 14:12 — with GitHub Actions Error

Add validation to predictAction in connector

284bb98

Signed-off-by: zane-neo <zaniu@amazon.com>

zane-neo temporarily deployed to ml-commons-cicd-env May 8, 2024 14:33 — with GitHub Actions Inactive

zane-neo had a problem deploying to ml-commons-cicd-env May 8, 2024 14:33 — with GitHub Actions Error

zane-neo had a problem deploying to ml-commons-cicd-env May 8, 2024 14:33 — with GitHub Actions Failure

zane-neo had a problem deploying to ml-commons-cicd-env May 8, 2024 14:33 — with GitHub Actions Error

Add check to multi-modal non image case

410f42b

Signed-off-by: zane-neo <zaniu@amazon.com>

zane-neo temporarily deployed to ml-commons-cicd-env May 8, 2024 15:23 — with GitHub Actions Inactive

zane-neo temporarily deployed to ml-commons-cicd-env May 8, 2024 23:53 — with GitHub Actions Inactive

zane-neo had a problem deploying to ml-commons-cicd-env May 8, 2024 23:53 — with GitHub Actions Error

zane-neo had a problem deploying to ml-commons-cicd-env May 8, 2024 23:53 — with GitHub Actions Failure

ylwu-amzn reviewed May 9, 2024

View reviewed changes

zane-neo mentioned this pull request May 9, 2024

[BUG] BWC issue with async http client change #2417

Closed

zane-neo temporarily deployed to ml-commons-cicd-env May 9, 2024 00:27 — with GitHub Actions Inactive

zane-neo had a problem deploying to ml-commons-cicd-env May 9, 2024 00:27 — with GitHub Actions Error

zane-neo had a problem deploying to ml-commons-cicd-env May 9, 2024 00:27 — with GitHub Actions Failure

zane-neo temporarily deployed to ml-commons-cicd-env May 9, 2024 01:12 — with GitHub Actions Inactive

zane-neo had a problem deploying to ml-commons-cicd-env May 9, 2024 01:12 — with GitHub Actions Error

zane-neo had a problem deploying to ml-commons-cicd-env May 9, 2024 01:12 — with GitHub Actions Failure

dhrubo-os approved these changes May 9, 2024

View reviewed changes

xinyual approved these changes May 9, 2024

View reviewed changes

zane-neo merged commit 89f23d2 into opensearch-project:main May 9, 2024
7 of 10 checks passed

zane-neo added the backport 2.x label May 9, 2024

opensearch-trigger-bot bot mentioned this pull request May 9, 2024

[Backport 2.x] Fix user defined preprocess function missing prediction issue #2426

Merged

zane-neo added the backport 2.14 label May 9, 2024

opensearch-trigger-bot bot mentioned this pull request May 9, 2024

[Backport 2.14] Fix user defined preprocess function missing prediction issue #2427

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix user defined preprocess function missing prediction issue #2418

Fix user defined preprocess function missing prediction issue #2418

zane-neo commented May 8, 2024 •

edited

Loading

ylwu-amzn May 9, 2024

ylwu-amzn May 9, 2024

zane-neo May 9, 2024

ylwu-amzn May 9, 2024

zane-neo May 9, 2024

ylwu-amzn May 9, 2024

Fix user defined preprocess function missing prediction issue #2418

Fix user defined preprocess function missing prediction issue #2418

Conversation

zane-neo commented May 8, 2024 • edited Loading

Description

Issues Resolved

Check List

ylwu-amzn May 9, 2024

Choose a reason for hiding this comment

ylwu-amzn May 9, 2024

Choose a reason for hiding this comment

zane-neo May 9, 2024

Choose a reason for hiding this comment

ylwu-amzn May 9, 2024

Choose a reason for hiding this comment

zane-neo May 9, 2024

Choose a reason for hiding this comment

ylwu-amzn May 9, 2024

Choose a reason for hiding this comment

zane-neo commented May 8, 2024 •

edited

Loading