Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix user defined preprocess function missing prediction issue #2418

Merged
merged 5 commits into from
May 9, 2024

Conversation

zane-neo
Copy link
Collaborator

@zane-neo zane-neo commented May 8, 2024

Description

When users use self defined preprocess scripts to process inputs, there'll be missing predictions for all inputs in the list except the first one.

Issues Resolved

#2417

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: zane-neo <zaniu@amazon.com>
Signed-off-by: zane-neo <zaniu@amazon.com>
@zane-neo zane-neo temporarily deployed to ml-commons-cicd-env May 8, 2024 14:33 — with GitHub Actions Inactive
@zane-neo zane-neo temporarily deployed to ml-commons-cicd-env May 8, 2024 14:33 — with GitHub Actions Inactive
@zane-neo zane-neo temporarily deployed to ml-commons-cicd-env May 8, 2024 14:33 — with GitHub Actions Inactive
@zane-neo zane-neo had a problem deploying to ml-commons-cicd-env May 8, 2024 14:33 — with GitHub Actions Error
@zane-neo zane-neo had a problem deploying to ml-commons-cicd-env May 8, 2024 14:33 — with GitHub Actions Failure
@zane-neo zane-neo had a problem deploying to ml-commons-cicd-env May 8, 2024 14:33 — with GitHub Actions Error
Signed-off-by: zane-neo <zaniu@amazon.com>
@zane-neo zane-neo temporarily deployed to ml-commons-cicd-env May 8, 2024 15:23 — with GitHub Actions Inactive
@zane-neo zane-neo temporarily deployed to ml-commons-cicd-env May 8, 2024 15:23 — with GitHub Actions Inactive
@zane-neo zane-neo temporarily deployed to ml-commons-cicd-env May 8, 2024 15:23 — with GitHub Actions Inactive
@zane-neo zane-neo temporarily deployed to ml-commons-cicd-env May 8, 2024 15:23 — with GitHub Actions Inactive
@zane-neo zane-neo temporarily deployed to ml-commons-cicd-env May 8, 2024 23:53 — with GitHub Actions Inactive
@zane-neo zane-neo had a problem deploying to ml-commons-cicd-env May 8, 2024 23:53 — with GitHub Actions Error
@zane-neo zane-neo had a problem deploying to ml-commons-cicd-env May 8, 2024 23:53 — with GitHub Actions Error
@zane-neo zane-neo had a problem deploying to ml-commons-cicd-env May 8, 2024 23:53 — with GitHub Actions Failure
String preProcessFunction = predictAction.get().getPreProcessFunction();
if (preProcessFunction != null && !MLPreProcessFunction.contains(preProcessFunction)) {
// user defined preprocess script, this case, the chunk size is always equals to text docs length.
return Tuple.tuple(textDocsLength, 1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this understanding correct ?

  1. textDocsLength means how many chunks
  2. 1 means step or chunk size ?

If correct, why hard code the chunk size as 1 ? This issue #2417 is an example, it's not the only case. For example some user may process two documents in one chunk.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

textDocsLength is chunks, 1 means step. When user process two documents in one chunk, user has to specify the input_docs_processed_step_size, in this case, this won't enter this branch. Multi-modal is a case that user needs to handle two documents in one chunk, I've tested this case and it works well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for testing multi-modal case.
It's possible that step size > 1 and no input_docs_processed_step_size. For example, before async http client, user can confiture pre_process function to read three documents and construct "inputs": [doc1, doc2, doc3], when response comes back, will move to next 3 docs

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a rare case, and we have workaround which is to add input_docs_processed_step_size configuration.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, based on our discussion, this sounds good

@zane-neo zane-neo temporarily deployed to ml-commons-cicd-env May 9, 2024 00:27 — with GitHub Actions Inactive
@zane-neo zane-neo temporarily deployed to ml-commons-cicd-env May 9, 2024 00:27 — with GitHub Actions Inactive
@zane-neo zane-neo temporarily deployed to ml-commons-cicd-env May 9, 2024 00:27 — with GitHub Actions Inactive
@zane-neo zane-neo had a problem deploying to ml-commons-cicd-env May 9, 2024 00:27 — with GitHub Actions Error
@zane-neo zane-neo had a problem deploying to ml-commons-cicd-env May 9, 2024 00:27 — with GitHub Actions Error
@zane-neo zane-neo had a problem deploying to ml-commons-cicd-env May 9, 2024 00:27 — with GitHub Actions Failure
@zane-neo zane-neo temporarily deployed to ml-commons-cicd-env May 9, 2024 01:12 — with GitHub Actions Inactive
@zane-neo zane-neo temporarily deployed to ml-commons-cicd-env May 9, 2024 01:12 — with GitHub Actions Inactive
@zane-neo zane-neo temporarily deployed to ml-commons-cicd-env May 9, 2024 01:12 — with GitHub Actions Inactive
@zane-neo zane-neo had a problem deploying to ml-commons-cicd-env May 9, 2024 01:12 — with GitHub Actions Error
@zane-neo zane-neo had a problem deploying to ml-commons-cicd-env May 9, 2024 01:12 — with GitHub Actions Error
@zane-neo zane-neo had a problem deploying to ml-commons-cicd-env May 9, 2024 01:12 — with GitHub Actions Failure
@zane-neo zane-neo merged commit 89f23d2 into opensearch-project:main May 9, 2024
7 of 10 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request May 9, 2024
* Fix user defined preprocess function missing prediction issue

Signed-off-by: zane-neo <zaniu@amazon.com>

* Add validation to predictAction in connector

Signed-off-by: zane-neo <zaniu@amazon.com>

* Add check to multi-modal non image case

Signed-off-by: zane-neo <zaniu@amazon.com>

* add UTs

Signed-off-by: zane-neo <zaniu@amazon.com>

* format code

Signed-off-by: zane-neo <zaniu@amazon.com>

---------

Signed-off-by: zane-neo <zaniu@amazon.com>
(cherry picked from commit 89f23d2)
opensearch-trigger-bot bot pushed a commit that referenced this pull request May 9, 2024
* Fix user defined preprocess function missing prediction issue

Signed-off-by: zane-neo <zaniu@amazon.com>

* Add validation to predictAction in connector

Signed-off-by: zane-neo <zaniu@amazon.com>

* Add check to multi-modal non image case

Signed-off-by: zane-neo <zaniu@amazon.com>

* add UTs

Signed-off-by: zane-neo <zaniu@amazon.com>

* format code

Signed-off-by: zane-neo <zaniu@amazon.com>

---------

Signed-off-by: zane-neo <zaniu@amazon.com>
(cherry picked from commit 89f23d2)
zane-neo added a commit that referenced this pull request May 9, 2024
…#2426)

* Fix user defined preprocess function missing prediction issue

Signed-off-by: zane-neo <zaniu@amazon.com>

* Add validation to predictAction in connector

Signed-off-by: zane-neo <zaniu@amazon.com>

* Add check to multi-modal non image case

Signed-off-by: zane-neo <zaniu@amazon.com>

* add UTs

Signed-off-by: zane-neo <zaniu@amazon.com>

* format code

Signed-off-by: zane-neo <zaniu@amazon.com>

---------

Signed-off-by: zane-neo <zaniu@amazon.com>
(cherry picked from commit 89f23d2)

Co-authored-by: zane-neo <zaniu@amazon.com>
zane-neo added a commit that referenced this pull request May 9, 2024
…#2427)

* Fix user defined preprocess function missing prediction issue

Signed-off-by: zane-neo <zaniu@amazon.com>

* Add validation to predictAction in connector

Signed-off-by: zane-neo <zaniu@amazon.com>

* Add check to multi-modal non image case

Signed-off-by: zane-neo <zaniu@amazon.com>

* add UTs

Signed-off-by: zane-neo <zaniu@amazon.com>

* format code

Signed-off-by: zane-neo <zaniu@amazon.com>

---------

Signed-off-by: zane-neo <zaniu@amazon.com>
(cherry picked from commit 89f23d2)

Co-authored-by: zane-neo <zaniu@amazon.com>
dhrubo-os pushed a commit to dhrubo-os/ml-commons that referenced this pull request May 17, 2024
…arch-project#2418) (opensearch-project#2426)

* Fix user defined preprocess function missing prediction issue

Signed-off-by: zane-neo <zaniu@amazon.com>

* Add validation to predictAction in connector

Signed-off-by: zane-neo <zaniu@amazon.com>

* Add check to multi-modal non image case

Signed-off-by: zane-neo <zaniu@amazon.com>

* add UTs

Signed-off-by: zane-neo <zaniu@amazon.com>

* format code

Signed-off-by: zane-neo <zaniu@amazon.com>

---------

Signed-off-by: zane-neo <zaniu@amazon.com>
(cherry picked from commit 89f23d2)

Co-authored-by: zane-neo <zaniu@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants