-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix user defined preprocess function missing prediction issue #2418
Changes from all commits
b6fa75f
284bb98
410f42b
3ec9934
ae74167
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -11,6 +11,7 @@ | |
import java.util.HashMap; | ||
import java.util.List; | ||
import java.util.Map; | ||
import java.util.Optional; | ||
import java.util.concurrent.ConcurrentHashMap; | ||
import java.util.concurrent.CountDownLatch; | ||
import java.util.concurrent.atomic.AtomicReference; | ||
|
@@ -27,6 +28,8 @@ | |
import org.opensearch.core.xcontent.NamedXContentRegistry; | ||
import org.opensearch.ml.common.FunctionName; | ||
import org.opensearch.ml.common.connector.Connector; | ||
import org.opensearch.ml.common.connector.ConnectorAction; | ||
import org.opensearch.ml.common.connector.MLPreProcessFunction; | ||
import org.opensearch.ml.common.dataset.MLInputDataset; | ||
import org.opensearch.ml.common.dataset.TextDocsInputDataSet; | ||
import org.opensearch.ml.common.dataset.remote.RemoteInferenceInputDataSet; | ||
|
@@ -99,6 +102,15 @@ private Tuple<Integer, Integer> calculateChunkSize(TextDocsInputDataSet textDocs | |
return Tuple.tuple(textDocsLength / stepSize + 1, stepSize); | ||
} | ||
} else { | ||
Optional<ConnectorAction> predictAction = getConnector().findPredictAction(); | ||
if (predictAction.isEmpty()) { | ||
throw new IllegalArgumentException("no predict action found"); | ||
} | ||
String preProcessFunction = predictAction.get().getPreProcessFunction(); | ||
if (preProcessFunction != null && !MLPreProcessFunction.contains(preProcessFunction)) { | ||
// user defined preprocess script, this case, the chunk size is always equals to text docs length. | ||
return Tuple.tuple(textDocsLength, 1); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this understanding correct ?
If correct, why hard code the chunk size as 1 ? This issue #2417 is an example, it's not the only case. For example some user may process two documents in one chunk. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. textDocsLength is chunks, 1 means step. When user process two documents in one chunk, user has to specify the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for testing multi-modal case. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a rare case, and we have workaround which is to add There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks, based on our discussion, this sounds good |
||
} | ||
// consider as batch. | ||
return Tuple.tuple(1, textDocsLength); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have unit test for this section?