4 thread in parallel #147

KSemenenko · 2023-11-09T13:08:15Z

Motivation and Context (Why the change? What's the scenario?)

Relate to #131, Im trying to seepd up document processing

High level description (Approach, Design)

just paralell tasks for now

so PR is about do discuss this idea to improve perfomance

dotnet/CoreLib/Handlers/GenerateEmbeddingsHandler.cs

dotnet/CoreLib/Handlers/SummarizationHandler.cs

# Conflicts: # dotnet/CoreLib/Handlers/SummarizationHandler.cs

dluc · 2023-11-09T17:23:14Z

The pipeline scalability is based on asynchronous queues being processed in parallel. If a message in the queue is taking too long because it is doing too much work and the work could be divided, why not leverage the existing infrastructure and split the task over multiple messages?

About embedding, many embedding generators support passing a list of strings to generate multiple embeddings at once. Maybe we should look into that too?

anthonypuppo · 2023-11-09T19:17:19Z

About embedding, many embedding generators support passing a list of strings to generate multiple embeddings at once. Maybe we should look into that too?

@dluc FWIW I have an open PR in the SK repo to enable batching microsoft/semantic-kernel#3295. If SK supports batching natively the changes required here are minimal. Or, could create a KM specific ITextEmbedding interface and change the default implementation (would mean rewriting pre-existing implementations that SK has though like OpenAI, HuggingFace, etc).

KSemenenko · 2023-11-09T19:47:36Z

The pipeline scalability is based on asynchronous queues being processed in parallel. If a message in the queue is taking too long because it is doing too much work and the work could be divided, why not leverage the existing infrastructure and split the task over multiple messages?

About embedding, many embedding generators support passing a list of strings to generate multiple embeddings at once. Maybe we should look into that too?

As I see in my tests, this uploadedFile.GeneratedFiles is small files are part of big one, so I need somehow imporove perfomance.

Mayme this is good idea to send bunch of strings, becase files art problem but a lot of small requests for embedding is the problem

dluc · 2023-11-13T21:02:52Z

Here's my suggestion: rather than changing GenerateEmbeddingsHandler, create a new handler, e.g. GenerateEmbeddingsInParallelHandler, assign a name e.g. gen_embeddings_parallel, and select this handler in your deployments. You can select handlers while inserting data (see steps param) or configure the service to use your custom handlers by default.

Longer term, what I would recommend investigating:

currently, after extracting text from a file, and after partitioning the text in multiple chunks, only 1 message is queued, moving from "partition" step to the next step "gen_embeddings"
rather than enqueuing only 1 message, enqueue N messages, one for every partition. This solution would allow to process all partitions in parallel over N machines, leveraging the existing pub-sub infra.

Pros and Cons:

Pros: infinite scaling, ignoring individual nodes specs (e.g. number of cores).
Cons: code changes to the core of the pipeline, vs changes to a single handler. E.g. ensuring N messages are enqueued at least once, aka increased logic to be resilient.

KSemenenko · 2023-11-14T18:38:21Z

@dluc btw how can I add my custim handler? I don't see any extestions like WithCustomHandler

dluc · 2023-11-14T20:03:42Z

@dluc btw how can I add my custim handler? I don't see any extestions like WithCustomHandler

I was about to provide an example, but the code is too complex. Currently there's two different methods, one AddHandler in the Memory class (if you are using the serverless memory) and a AddHandlerAsync in the orchestrator (if you are using the service). A bit too hard to setup, needs some work to follow the usual builder approach.

KSemenenko · 2023-11-21T08:10:36Z

@dluc what do you think about WithCustomHandler I added in this PR?

# Conflicts: # service/Core/Handlers/GenerateParallelEmbeddingsHandler.cs # service/Core/Handlers/SummarizationParallelHandler.cs # service/tests/FunctionalTests/National-Planning-Policy-Framework.pdf

KSemenenko · 2023-11-22T11:14:56Z

@dluc I tested latest version and it works fine with perfomncae, so maybe we can convert this PR into "custom handler extenstions" ?

extensions/LlamaSharp/LlamaSharp.FunctionalTests/LlamaSharp.FunctionalTests.csproj

service/Core/Pipeline/InProcessPipelineOrchestrator.cs

dluc · 2024-03-15T12:44:07Z

PR updated. If the code is still working it could be merged as is. Handlers now can be configured in the service without touching dependency injection and other files (see appsettings.json list of handlers)

service/Core/Handlers/ParallelGenerateEmbeddingsHandler.cs

service/Core/Handlers/ParallelSummarizationHandler.cs

KSemenenko · 2024-03-16T12:46:28Z

I will check this code

# Conflicts: # service/Core/KernelMemoryBuilder.cs # service/tests/Core.FunctionalTests/National-Planning-Policy-Framework.pdf # service/tests/Core.FunctionalTests/ServerLess/SubDirFilesAndStreamsTest.cs # service/tests/Service.FunctionalTests/Service.FunctionalTests.csproj

…pipline # Conflicts: # service/Core/AppBuilders/DependencyInjection.cs # service/Core/KernelMemoryBuilder.cs

# Conflicts: # service/Core/Core.csproj

KSemenenko · 2024-04-05T14:59:53Z

@dluc I made some changes in the processing, what do you think about it? I now prefer a more standard way - parallel foreach. Also, I think it can be just part of a regular Handler, as it really relies on asynchronous operations. As an option, we could consider how many threads this should be distributed over, because I feel that supporting 2-3 handlers might be challenging.

KSemenenko · 2024-04-05T15:00:55Z

also I use lock (summaryFiles) becase it's much faster then ConcurrentQueue or so.
and this is private class, so shoud'be be any side effects.

dluc

I added some tests and renamed the handlers, not to replace the default ones. From my tests the parallel embeddings handler shows a faster execution, while the summarization handler takes about the same time to generate a summary. The handlers can be used on demand, while the default ones are still in use.

KSemenenko · 2024-04-17T17:56:41Z

This is amazing! Thanks a lot!

4 thread in parallel

50acc24

anthonypuppo reviewed Nov 9, 2023

View reviewed changes

dotnet/CoreLib/Handlers/GenerateEmbeddingsHandler.cs Outdated Show resolved Hide resolved

anthonypuppo reviewed Nov 9, 2023

View reviewed changes

dotnet/CoreLib/Handlers/GenerateEmbeddingsHandler.cs Outdated Show resolved Hide resolved

anthonypuppo reviewed Nov 9, 2023

View reviewed changes

dotnet/CoreLib/Handlers/SummarizationHandler.cs Outdated Show resolved Hide resolved

KSemenenko added 3 commits November 9, 2023 17:35

Merge remote-tracking branch 'origin/main' into parallel-pipline

8fe67da

# Conflicts: # dotnet/CoreLib/Handlers/SummarizationHandler.cs

revert changes

27a8a6f

lock fix

eed3298

missed await, and Environment.ProcessorCount

1935841

logs

75f66f2

KSemenenko added 3 commits November 20, 2023 23:46

Merge remote-tracking branch 'origin/main' into parallel-pipline

e81a833

WithCustomHandler

a07dee2

register handler

dab9be8

KSemenenko added 7 commits November 21, 2023 11:31

handlers

a4c06f4

revert

931a90d

referet handlers

6c410db

Parallel

e547d7d

test

a2bfdd5

stopwatch

e618281

Merge remote-tracking branch 'origin/main' into parallel-pipline

52c5c85

# Conflicts: # service/Core/Handlers/GenerateParallelEmbeddingsHandler.cs # service/Core/Handlers/SummarizationParallelHandler.cs # service/tests/FunctionalTests/National-Planning-Policy-Framework.pdf

4 thread in parallel

5f8c2a5

dluc force-pushed the parallel-pipline branch from 52c5c85 to 73b3461 Compare March 15, 2024 12:24