-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Support batch ingestion in TextEmbeddingProcessor & SparseEncodingProcessor #743
Comments
Hi @chishui , could you please provide an example API request body to create a batched ingest processor? |
We are not adding this feature for |
@zhichao-aws there is no changes to how TextEmbeddingProcessor and SparseEncodingProcessor are created. It's only when user uses |
@martin-gaievski we don't have a plan to support |
Updated description in "What solution would you like?" section. |
@chishui given that the feature is merged across all the components, what is the final benchmarking number for the batch ingestion? |
@chishui can you attach the documentation issue here for tracking purpose as this feature. |
@chishui I looked into the benchmarks here: opensearch-project/OpenSearch#12457 (comment), when we say there is a significant improvement on throughput I see it is with a gpu instance and not with any other model. Is this correct? |
This is the recent benchmark I redid with neural sparse model hosted on sagemaker
SageMaker
|
We did benchmarked on Sagemaker, OpenAI and Cohere which all results were listed in the comment you linked. We saw improvement on throughput for all these services. The gpu instance mentioned was the one SageMaker used as we can choose instance type for SageMaker but we have no idea what type of gpu instance OpenAI and Cohere used. |
|
Is your feature request related to a problem?
RFC: opensearch-project/OpenSearch#12457
We have implemented batch ingestion logic in OpenSearch core in version 2.14, now we want to enable the batch ingestion capability in neural-search processors: TextEmbeddingProcessor & SparseEncodingProcessor so that we can better utilize the remote ML server's GPU capacity and accelerate the ingestion process, based our benchmark, batch can reduce total ingestion time by 77% without seeing throttling error (P90, SageMaker), please refer to here to see the benchmark results.
What solution would you like?
InferenceProcessor
, overrideProcessor
'sbatchExecute
API, add a default implementation to combineList<String> inferenceText
from multiple docs, then reuse themlCommonsClientAccessor.inferenceSentences
andmlCommonsClientAccessor.inferenceSentencesWithMapResult
. After getting inference results, map them to each doc and update the docs.(This was original proposed in ml-commons. But as @ylwu-amzn suggested that we can reuse
input_docs_processed_step_size
as max batch size, then it makes more sense to sort the docs in neural-search as we can ensure that we won't sort docs from TextImageEmbeddingProcessor)What alternatives have you considered?
N/A
Do you have any additional context?
N/A
The text was updated successfully, but these errors were encountered: