Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4 thread in parallel #147

Merged
merged 25 commits into from
Apr 16, 2024
Merged

4 thread in parallel #147

merged 25 commits into from
Apr 16, 2024

Conversation

KSemenenko
Copy link
Contributor

Motivation and Context (Why the change? What's the scenario?)

Relate to #131, Im trying to seepd up document processing

High level description (Approach, Design)

just paralell tasks for now

so PR is about do discuss this idea to improve perfomance

@dluc
Copy link
Collaborator

dluc commented Nov 9, 2023

The pipeline scalability is based on asynchronous queues being processed in parallel. If a message in the queue is taking too long because it is doing too much work and the work could be divided, why not leverage the existing infrastructure and split the task over multiple messages?

About embedding, many embedding generators support passing a list of strings to generate multiple embeddings at once. Maybe we should look into that too?

@anthonypuppo
Copy link
Contributor

About embedding, many embedding generators support passing a list of strings to generate multiple embeddings at once. Maybe we should look into that too?

@dluc FWIW I have an open PR in the SK repo to enable batching microsoft/semantic-kernel#3295. If SK supports batching natively the changes required here are minimal. Or, could create a KM specific ITextEmbedding interface and change the default implementation (would mean rewriting pre-existing implementations that SK has though like OpenAI, HuggingFace, etc).

@KSemenenko
Copy link
Contributor Author

KSemenenko commented Nov 9, 2023

The pipeline scalability is based on asynchronous queues being processed in parallel. If a message in the queue is taking too long because it is doing too much work and the work could be divided, why not leverage the existing infrastructure and split the task over multiple messages?

About embedding, many embedding generators support passing a list of strings to generate multiple embeddings at once. Maybe we should look into that too?

As I see in my tests, this uploadedFile.GeneratedFiles is small files are part of big one, so I need somehow imporove perfomance.

Mayme this is good idea to send bunch of strings, becase files art problem but a lot of small requests for embedding is the problem

@dluc
Copy link
Collaborator

dluc commented Nov 13, 2023

Here's my suggestion: rather than changing GenerateEmbeddingsHandler, create a new handler, e.g. GenerateEmbeddingsInParallelHandler, assign a name e.g. gen_embeddings_parallel, and select this handler in your deployments. You can select handlers while inserting data (see steps param) or configure the service to use your custom handlers by default.

Longer term, what I would recommend investigating:

  • currently, after extracting text from a file, and after partitioning the text in multiple chunks, only 1 message is queued, moving from "partition" step to the next step "gen_embeddings"
  • rather than enqueuing only 1 message, enqueue N messages, one for every partition. This solution would allow to process all partitions in parallel over N machines, leveraging the existing pub-sub infra.

Pros and Cons:

  • Pros: infinite scaling, ignoring individual nodes specs (e.g. number of cores).
  • Cons: code changes to the core of the pipeline, vs changes to a single handler. E.g. ensuring N messages are enqueued at least once, aka increased logic to be resilient.

@KSemenenko
Copy link
Contributor Author

@dluc btw how can I add my custim handler? I don't see any extestions like WithCustomHandler

@dluc
Copy link
Collaborator

dluc commented Nov 14, 2023

@dluc btw how can I add my custim handler? I don't see any extestions like WithCustomHandler

I was about to provide an example, but the code is too complex. Currently there's two different methods, one AddHandler in the Memory class (if you are using the serverless memory) and a AddHandlerAsync in the orchestrator (if you are using the service). A bit too hard to setup, needs some work to follow the usual builder approach.

@KSemenenko
Copy link
Contributor Author

@dluc what do you think about WithCustomHandler I added in this PR?

# Conflicts:
#	service/Core/Handlers/GenerateParallelEmbeddingsHandler.cs
#	service/Core/Handlers/SummarizationParallelHandler.cs
#	service/tests/FunctionalTests/National-Planning-Policy-Framework.pdf
@KSemenenko
Copy link
Contributor Author

@dluc I tested latest version and it works fine with perfomncae, so maybe we can convert this PR into "custom handler extenstions" ?

@dluc dluc force-pushed the parallel-pipline branch from 52c5c85 to 73b3461 Compare March 15, 2024 12:24
@dluc dluc force-pushed the parallel-pipline branch from e1bdd83 to 4b83cf1 Compare March 15, 2024 12:42
@dluc
Copy link
Collaborator

dluc commented Mar 15, 2024

PR updated. If the code is still working it could be merged as is. Handlers now can be configured in the service without touching dependency injection and other files (see appsettings.json list of handlers)

@KSemenenko
Copy link
Contributor Author

I will check this code

# Conflicts:
#	service/Core/KernelMemoryBuilder.cs
#	service/tests/Core.FunctionalTests/National-Planning-Policy-Framework.pdf
#	service/tests/Core.FunctionalTests/ServerLess/SubDirFilesAndStreamsTest.cs
#	service/tests/Service.FunctionalTests/Service.FunctionalTests.csproj
…pipline

# Conflicts:
#	service/Core/AppBuilders/DependencyInjection.cs
#	service/Core/KernelMemoryBuilder.cs
@KSemenenko
Copy link
Contributor Author

@dluc I made some changes in the processing, what do you think about it? I now prefer a more standard way - parallel foreach. Also, I think it can be just part of a regular Handler, as it really relies on asynchronous operations. As an option, we could consider how many threads this should be distributed over, because I feel that supporting 2-3 handlers might be challenging.

@KSemenenko
Copy link
Contributor Author

also I use lock (summaryFiles) becase it's much faster then ConcurrentQueue or so.
and this is private class, so shoud'be be any side effects.

Copy link
Collaborator

@dluc dluc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some tests and renamed the handlers, not to replace the default ones. From my tests the parallel embeddings handler shows a faster execution, while the summarization handler takes about the same time to generate a summary. The handlers can be used on demand, while the default ones are still in use.

@dluc dluc merged commit 5a816f8 into microsoft:main Apr 16, 2024
2 checks passed
@dluc dluc mentioned this pull request Apr 16, 2024
@KSemenenko KSemenenko deleted the parallel-pipline branch April 17, 2024 17:56
@KSemenenko
Copy link
Contributor Author

This is amazing! Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants