-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix user defined preprocess function missing prediction issue #2418
Fix user defined preprocess function missing prediction issue #2418
Conversation
Signed-off-by: zane-neo <zaniu@amazon.com>
Signed-off-by: zane-neo <zaniu@amazon.com>
Signed-off-by: zane-neo <zaniu@amazon.com>
String preProcessFunction = predictAction.get().getPreProcessFunction(); | ||
if (preProcessFunction != null && !MLPreProcessFunction.contains(preProcessFunction)) { | ||
// user defined preprocess script, this case, the chunk size is always equals to text docs length. | ||
return Tuple.tuple(textDocsLength, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this understanding correct ?
textDocsLength
means how many chunks1
means step or chunk size ?
If correct, why hard code the chunk size as 1 ? This issue #2417 is an example, it's not the only case. For example some user may process two documents in one chunk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
textDocsLength is chunks, 1 means step. When user process two documents in one chunk, user has to specify the input_docs_processed_step_size
, in this case, this won't enter this branch. Multi-modal is a case that user needs to handle two documents in one chunk, I've tested this case and it works well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for testing multi-modal case.
It's possible that step size > 1 and no input_docs_processed_step_size
. For example, before async http client, user can confiture pre_process function to read three documents and construct "inputs": [doc1, doc2, doc3]
, when response comes back, will move to next 3 docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a rare case, and we have workaround which is to add input_docs_processed_step_size
configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, based on our discussion, this sounds good
* Fix user defined preprocess function missing prediction issue Signed-off-by: zane-neo <zaniu@amazon.com> * Add validation to predictAction in connector Signed-off-by: zane-neo <zaniu@amazon.com> * Add check to multi-modal non image case Signed-off-by: zane-neo <zaniu@amazon.com> * add UTs Signed-off-by: zane-neo <zaniu@amazon.com> * format code Signed-off-by: zane-neo <zaniu@amazon.com> --------- Signed-off-by: zane-neo <zaniu@amazon.com> (cherry picked from commit 89f23d2)
* Fix user defined preprocess function missing prediction issue Signed-off-by: zane-neo <zaniu@amazon.com> * Add validation to predictAction in connector Signed-off-by: zane-neo <zaniu@amazon.com> * Add check to multi-modal non image case Signed-off-by: zane-neo <zaniu@amazon.com> * add UTs Signed-off-by: zane-neo <zaniu@amazon.com> * format code Signed-off-by: zane-neo <zaniu@amazon.com> --------- Signed-off-by: zane-neo <zaniu@amazon.com> (cherry picked from commit 89f23d2)
…#2426) * Fix user defined preprocess function missing prediction issue Signed-off-by: zane-neo <zaniu@amazon.com> * Add validation to predictAction in connector Signed-off-by: zane-neo <zaniu@amazon.com> * Add check to multi-modal non image case Signed-off-by: zane-neo <zaniu@amazon.com> * add UTs Signed-off-by: zane-neo <zaniu@amazon.com> * format code Signed-off-by: zane-neo <zaniu@amazon.com> --------- Signed-off-by: zane-neo <zaniu@amazon.com> (cherry picked from commit 89f23d2) Co-authored-by: zane-neo <zaniu@amazon.com>
…#2427) * Fix user defined preprocess function missing prediction issue Signed-off-by: zane-neo <zaniu@amazon.com> * Add validation to predictAction in connector Signed-off-by: zane-neo <zaniu@amazon.com> * Add check to multi-modal non image case Signed-off-by: zane-neo <zaniu@amazon.com> * add UTs Signed-off-by: zane-neo <zaniu@amazon.com> * format code Signed-off-by: zane-neo <zaniu@amazon.com> --------- Signed-off-by: zane-neo <zaniu@amazon.com> (cherry picked from commit 89f23d2) Co-authored-by: zane-neo <zaniu@amazon.com>
…arch-project#2418) (opensearch-project#2426) * Fix user defined preprocess function missing prediction issue Signed-off-by: zane-neo <zaniu@amazon.com> * Add validation to predictAction in connector Signed-off-by: zane-neo <zaniu@amazon.com> * Add check to multi-modal non image case Signed-off-by: zane-neo <zaniu@amazon.com> * add UTs Signed-off-by: zane-neo <zaniu@amazon.com> * format code Signed-off-by: zane-neo <zaniu@amazon.com> --------- Signed-off-by: zane-neo <zaniu@amazon.com> (cherry picked from commit 89f23d2) Co-authored-by: zane-neo <zaniu@amazon.com>
Description
When users use self defined preprocess scripts to process inputs, there'll be missing predictions for all inputs in the list except the first one.
Issues Resolved
#2417
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.