Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.Net: Batch OpenAI embeddings request #3295

Closed
wants to merge 11 commits into from
Closed

.Net: Batch OpenAI embeddings request #3295

wants to merge 11 commits into from

Conversation

anthonypuppo
Copy link
Contributor

@anthonypuppo anthonypuppo commented Oct 25, 2023

Motivation and Context

The OpenAI embeddings endpoint and text-embedding-ada-002 supports either a single item or an array of items. Rather than send text inputs one by one, the request should send them all at once for improved performance.

From MS docs:

OpenAI currently allows a larger number of array inputs with text-embedding-ada-002. Azure OpenAI currently supports input arrays up to 16 for text-embedding-ada-002 (Version 2). Both require the max input token limit per API request to remain under 8191 for this model.

Fixes #3294.

Description

Text data is sent as an array to the embeddings endpoint instead of an individual request per text. This also aligns embedding request functionality with how the HuggingFace connector works.

Contribution Checklist

@anthonypuppo anthonypuppo requested a review from a team as a code owner October 25, 2023 13:33
@shawncal shawncal added .NET Issue or Pull requests regarding .NET code kernel Issues or pull requests impacting the core kernel labels Oct 25, 2023
@anthonypuppo
Copy link
Contributor Author

Non-breaking at compile time but may break at runtime for users that are passing more data then the model allows (combined 16 inputs and 8191 tokens for text-embedding-ada-002).

@@ -154,27 +154,31 @@ await foreach (StreamingChoice choice in streamingChatCompletions.GetChoicesStre
IList<string> data,
CancellationToken cancellationToken = default)
{
var result = new List<ReadOnlyMemory<float>>(data.Count);
foreach (string text in data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current approach allows to pass any number of strings, given that they are processed one at a time. The change requires to limit the list to the max allowed by OpenAI and Azure. It feels like a breaking change. For instance, clients passing hundreds of strings will have to refactor their code taking into account the limits of the model selected.

@markwallace-microsoft
Copy link
Member

@anthonypuppo Thanks for your contribution. Based on the commets above it sounds like we don't need this change. I'm going to close this PR but if I'm mistaken please fix the merge conflict and re-open the PR.

@anthonypuppo
Copy link
Contributor Author

@markwallace-microsoft This change is absolutely necessary.

The current implementation executes an embedding request per input. That's not necessarily wrong, but extremely inefficient as both OpenAI and Azure OpenAI support batching. It's "breaking" due to model limits (i.e. text-embedding-ada-002 only supports up to a max of 16 inputs and ~8k tokens per batch). Existing user implementations could be users passing a large amount of data so they would have to refactor to handle the input chunking according to their user case.

FWIW the Hugging Face connector already does this (no loop, send all inputs at once):

private async Task<IList<ReadOnlyMemory<float>>> ExecuteEmbeddingRequestAsync(IList<string> data, CancellationToken cancellationToken)
{
var embeddingRequest = new TextEmbeddingRequest
{
Input = data
};
using var httpRequestMessage = HttpRequest.CreatePostRequest(this.GetRequestUri(), embeddingRequest);

Apologies for letting this branch get so out of sync. I'll have it fixed up shortly.

github-merge-queue bot pushed a commit that referenced this pull request Jan 31, 2024
### Motivation and Context
Continuation of #3295

The OpenAI embeddings endpoint and `text-embedding-ada-002` supports
either a single item or an array of items. Rather than send text inputs
one by one, the request should send them all at once for improved
performance.

Note that this will require callers to manage their own batching
strategy (added an example to demonstrate). I view this as a worthwhile
tradeoff for improved performance out of the box.

Fixes #3294.

### Description
- Text data is sent as an array to the embeddings endpoint instead of an
individual request per text.
- Aligns embedding request functionality with how the HuggingFace
connector works.
- Remove unnecessary `ToArray()` call on the response embeddings.

### Contribution Checklist

<!-- Before submitting this PR, please make sure: -->

- [x] The code builds clean without any errors or warnings
- [x] The PR follows the [SK Contribution
Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md)
and the [pre-submission formatting
script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts)
raises no violations
- [x] All unit tests pass, and I have added new tests where possible
- [ ] I didn't break anyone 😄 (implementations passing more than
16 inputs will get an error from the OpenAI endpoint and need to batch
appropriately)

---------

Co-authored-by: Dmytro Struk <13853051+dmytrostruk@users.noreply.github.com>
Bryan-Roe pushed a commit to Bryan-Roe-ai/semantic-kernel that referenced this pull request Oct 6, 2024
### Motivation and Context
Continuation of microsoft#3295

The OpenAI embeddings endpoint and `text-embedding-ada-002` supports
either a single item or an array of items. Rather than send text inputs
one by one, the request should send them all at once for improved
performance.

Note that this will require callers to manage their own batching
strategy (added an example to demonstrate). I view this as a worthwhile
tradeoff for improved performance out of the box.

Fixes microsoft#3294.

### Description
- Text data is sent as an array to the embeddings endpoint instead of an
individual request per text.
- Aligns embedding request functionality with how the HuggingFace
connector works.
- Remove unnecessary `ToArray()` call on the response embeddings.

### Contribution Checklist

<!-- Before submitting this PR, please make sure: -->

- [x] The code builds clean without any errors or warnings
- [x] The PR follows the [SK Contribution
Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md)
and the [pre-submission formatting
script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts)
raises no violations
- [x] All unit tests pass, and I have added new tests where possible
- [ ] I didn't break anyone 😄 (implementations passing more than
16 inputs will get an error from the OpenAI endpoint and need to batch
appropriately)

---------

Co-authored-by: Dmytro Struk <13853051+dmytrostruk@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kernel Issues or pull requests impacting the core kernel .NET Issue or Pull requests regarding .NET code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

.Net: Batch embeddings request
4 participants