.Net: Batch OpenAI embeddings request #3295

anthonypuppo · 2023-10-25T13:33:41Z

Motivation and Context

The OpenAI embeddings endpoint and text-embedding-ada-002 supports either a single item or an array of items. Rather than send text inputs one by one, the request should send them all at once for improved performance.

From MS docs:

OpenAI currently allows a larger number of array inputs with text-embedding-ada-002. Azure OpenAI currently supports input arrays up to 16 for text-embedding-ada-002 (Version 2). Both require the max input token limit per API request to remain under 8191 for this model.

Fixes #3294.

Description

Text data is sent as an array to the embeddings endpoint instead of an individual request per text. This also aligns embedding request functionality with how the HuggingFace connector works.

Contribution Checklist

The code builds clean without any errors or warnings
The PR follows the SK Contribution Guidelines and the pre-submission formatting script raises no violations
All unit tests pass, and I have added new tests where possible
I didn't break anyone 😄

anthonypuppo · 2023-10-25T13:35:45Z

Non-breaking at compile time but may break at runtime for users that are passing more data then the model allows (combined 16 inputs and 8191 tokens for text-embedding-ada-002).

dluc · 2023-11-13T21:01:16Z

dotnet/src/Connectors/Connectors.AI.OpenAI/AzureSdk/ClientBase.cs

@@ -154,27 +154,31 @@ await foreach (StreamingChoice choice in streamingChatCompletions.GetChoicesStre
        IList<string> data,
        CancellationToken cancellationToken = default)
    {
-        var result = new List<ReadOnlyMemory<float>>(data.Count);
-        foreach (string text in data)


The current approach allows to pass any number of strings, given that they are processed one at a time. The change requires to limit the list to the max allowed by OpenAI and Azure. It feels like a breaking change. For instance, clients passing hundreds of strings will have to refactor their code taking into account the limits of the model selected.

markwallace-microsoft · 2024-01-11T14:17:31Z

@anthonypuppo Thanks for your contribution. Based on the commets above it sounds like we don't need this change. I'm going to close this PR but if I'm mistaken please fix the merge conflict and re-open the PR.

anthonypuppo · 2024-01-11T14:31:18Z

@markwallace-microsoft This change is absolutely necessary.

The current implementation executes an embedding request per input. That's not necessarily wrong, but extremely inefficient as both OpenAI and Azure OpenAI support batching. It's "breaking" due to model limits (i.e. text-embedding-ada-002 only supports up to a max of 16 inputs and ~8k tokens per batch). Existing user implementations could be users passing a large amount of data so they would have to refactor to handle the input chunking according to their user case.

FWIW the Hugging Face connector already does this (no loop, send all inputs at once):

semantic-kernel/dotnet/src/Connectors/Connectors.HuggingFace/TextEmbedding/HuggingFaceTextEmbeddingGenerationService.cs

Lines 105 to 112 in 0772d01

    
           private async Task<IList<ReadOnlyMemory<float>>> ExecuteEmbeddingRequestAsync(IList<string> data, CancellationToken cancellationToken) 
        
           { 
        
               var embeddingRequest = new TextEmbeddingRequest 
        
               { 
        
                   Input = data 
        
               }; 
        
               using var httpRequestMessage = HttpRequest.CreatePostRequest(this.GetRequestUri(), embeddingRequest);

Apologies for letting this branch get so out of sync. I'll have it fixed up shortly.

### Motivation and Context Continuation of #3295 The OpenAI embeddings endpoint and `text-embedding-ada-002` supports either a single item or an array of items. Rather than send text inputs one by one, the request should send them all at once for improved performance. Note that this will require callers to manage their own batching strategy (added an example to demonstrate). I view this as a worthwhile tradeoff for improved performance out of the box. Fixes #3294. ### Description - Text data is sent as an array to the embeddings endpoint instead of an individual request per text. - Aligns embedding request functionality with how the HuggingFace connector works. - Remove unnecessary `ToArray()` call on the response embeddings. ### Contribution Checklist  - [x] The code builds clean without any errors or warnings - [x] The PR follows the [SK Contribution Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md) and the [pre-submission formatting script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts) raises no violations - [x] All unit tests pass, and I have added new tests where possible - [ ] I didn't break anyone 😄 (implementations passing more than 16 inputs will get an error from the OpenAI endpoint and need to batch appropriately) --------- Co-authored-by: Dmytro Struk <13853051+dmytrostruk@users.noreply.github.com>

### Motivation and Context Continuation of microsoft#3295 The OpenAI embeddings endpoint and `text-embedding-ada-002` supports either a single item or an array of items. Rather than send text inputs one by one, the request should send them all at once for improved performance. Note that this will require callers to manage their own batching strategy (added an example to demonstrate). I view this as a worthwhile tradeoff for improved performance out of the box. Fixes microsoft#3294. ### Description - Text data is sent as an array to the embeddings endpoint instead of an individual request per text. - Aligns embedding request functionality with how the HuggingFace connector works. - Remove unnecessary `ToArray()` call on the response embeddings. ### Contribution Checklist  - [x] The code builds clean without any errors or warnings - [x] The PR follows the [SK Contribution Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md) and the [pre-submission formatting script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts) raises no violations - [x] All unit tests pass, and I have added new tests where possible - [ ] I didn't break anyone 😄 (implementations passing more than 16 inputs will get an error from the OpenAI endpoint and need to batch appropriately) --------- Co-authored-by: Dmytro Struk <13853051+dmytrostruk@users.noreply.github.com>

Batch OpenAI embeddings request

b099550

anthonypuppo requested a review from a team as a code owner October 25, 2023 13:33

shawncal added .NET Issue or Pull requests regarding .NET code kernel Issues or pull requests impacting the core kernel labels Oct 25, 2023

anthonypuppo added 5 commits October 29, 2023 10:38

Merge branch 'main' into openai-embeddings-batch

8894986

Merge branch 'main' into openai-embeddings-batch

e55eb00

Merge branch 'main' into openai-embeddings-batch

812328e

Merge branch 'main' into openai-embeddings-batch

43c4a0f

Merge branch 'main' into openai-embeddings-batch

fd6f5db

anthonypuppo mentioned this pull request Nov 9, 2023

4 thread in parallel microsoft/kernel-memory#147

Merged

dluc reviewed Nov 13, 2023

View reviewed changes

anthonypuppo added 5 commits November 20, 2023 12:00

Merge branch 'main' into openai-embeddings-batch

ccf840c

Merge branch 'main' into openai-embeddings-batch

1e531f9

Merge branch 'main' into openai-embeddings-batch

80ddbb4

Update new ClientCore class with embedding batching logic

3fffbfb

Merge branch 'main' into openai-embeddings-batch

242ac6f

markwallace-microsoft closed this Jan 11, 2024

anthonypuppo mentioned this pull request Jan 13, 2024

.Net: OpenAI embeddings refactor #4600

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.Net: Batch OpenAI embeddings request #3295

.Net: Batch OpenAI embeddings request #3295

anthonypuppo commented Oct 25, 2023 •

edited

Loading

anthonypuppo commented Oct 25, 2023

dluc Nov 13, 2023

markwallace-microsoft commented Jan 11, 2024

anthonypuppo commented Jan 11, 2024

.Net: Batch OpenAI embeddings request #3295

.Net: Batch OpenAI embeddings request #3295

Conversation

anthonypuppo commented Oct 25, 2023 • edited Loading

Motivation and Context

Description

Contribution Checklist

anthonypuppo commented Oct 25, 2023

dluc Nov 13, 2023

Choose a reason for hiding this comment

markwallace-microsoft commented Jan 11, 2024

anthonypuppo commented Jan 11, 2024

anthonypuppo commented Oct 25, 2023 •

edited

Loading