Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add batch_size param for text_embedding processor #1298

Conversation

YeonghyeonKO
Copy link
Contributor

@YeonghyeonKO YeonghyeonKO commented Nov 17, 2024

Description

Unlike the document from text_embedding processor in ingest-pipelines describes batch_size parameter, interface in opensearch-java client doesn't include it. From OpenSearch 2.16, it's possible to add batch inference support in ingest processors which inherits from AbstractBatchingProcessor(opensearch-project/neural-search#820). From now on, OpenSearch java client doesn't support batch_size parameter(optional) when defining text_embedding processor.

Since @miguel-vila's contribution by adding TextEmbeddingProcessor has been merged, there was another big change in opensearch-project/neural-search(opensearch-project/neural-search#820). In line with this change, I attempted to modify the code of the text_embedding processor of the opensearch-java client.

Issues Resolved

This PR is related to #1297


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Collaborator

@Xtansia Xtansia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @YeonghyeonKO!
Just a few things to address:

Signed-off-by: YeonghyeonKO <dk02315@gmail.com>
Signed-off-by: YeonghyeonKO <dk02315@gmail.com>
Signed-off-by: YeonghyeonKO <dk02315@gmail.com>
Signed-off-by: YeonghyeonKO <dk02315@gmail.com>
@YeonghyeonKO YeonghyeonKO force-pushed the feat/batchSize-in-TextEmbeddingProcessor branch from 90b81ab to 7f2aaa3 Compare November 18, 2024 12:55
Signed-off-by: YeonghyeonKO <dk02315@gmail.com>
Signed-off-by: YeonghyeonKO <dk02315@gmail.com>
@YeonghyeonKO YeonghyeonKO requested a review from Xtansia November 18, 2024 13:02
YeonghyeonKO and others added 2 commits November 18, 2024 22:06
Signed-off-by: YeonghyeonKO <dk02315@gmail.com>
Signed-off-by: Thomas Farr <tsfarr@amazon.com>
@Xtansia Xtansia merged commit 6c3e68f into opensearch-project:main Nov 18, 2024
55 of 56 checks passed
@Xtansia Xtansia added the backport 2.x Backport to 2.x branch label Nov 18, 2024
opensearch-trigger-bot bot pushed a commit that referenced this pull request Nov 18, 2024
* Add batchSize parameter for text_embedding processor

Signed-off-by: YeonghyeonKO <dk02315@gmail.com>

* throw IllegalArgumentException when batchSize is not a positive integer

Signed-off-by: YeonghyeonKO <dk02315@gmail.com>

* test: add test cases for BatchSize param

Signed-off-by: YeonghyeonKO <dk02315@gmail.com>

* test: exception when batchSize is zero or negative integer

Signed-off-by: YeonghyeonKO <dk02315@gmail.com>

* refactor: use assertNotNull for readability & convention

Signed-off-by: YeonghyeonKO <dk02315@gmail.com>

* update CHANGELOG about #1298 PR

Signed-off-by: YeonghyeonKO <dk02315@gmail.com>

* apply code convention to keep the codes spotless

Signed-off-by: YeonghyeonKO <dk02315@gmail.com>

---------

Signed-off-by: YeonghyeonKO <dk02315@gmail.com>
Signed-off-by: Thomas Farr <tsfarr@amazon.com>
Co-authored-by: Thomas Farr <tsfarr@amazon.com>
(cherry picked from commit 6c3e68f)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Xtansia added a commit that referenced this pull request Nov 18, 2024
* Add batchSize parameter for text_embedding processor



* throw IllegalArgumentException when batchSize is not a positive integer



* test: add test cases for BatchSize param



* test: exception when batchSize is zero or negative integer



* refactor: use assertNotNull for readability & convention



* update CHANGELOG about #1298 PR



* apply code convention to keep the codes spotless



---------




(cherry picked from commit 6c3e68f)

Signed-off-by: YeonghyeonKO <dk02315@gmail.com>
Signed-off-by: Thomas Farr <tsfarr@amazon.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Thomas Farr <tsfarr@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants