-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Epic] Improve concurrency in LLMBlock #135
Milestone
Comments
Please use this: aakankshaduggal#6 aakankshaduggal#8 has been closed in favor of aakankshaduggal#6 |
Merged
This was referenced Jul 18, 2024
gabe-l-hart
added a commit
to gabe-l-hart/instructlab-sdg
that referenced
this issue
Jul 18, 2024
Problem statement from npalaska@redhat.com: Overview The current implementation of LLMBlock sends the entire dataset as a single large batch of requests to the OpenAI server. This may lead to some requests waiting too long for the response, resulting in timeout errors and potentially overloading the backend server with extremely large batches. Proposed Changes Use concurrent processing using Python’s concurrent.futures package in LLMBlock. The key changes are: * Utilizes concurrent.futures for managing parallel tasks with threading for launching parallel tasks. * Allows users to specify the number of requests to send in each batch. * Allows users to specify the number of concurrent worker threads to handle batches. Example Usage If the user sets the concurrency to 8 and the batch size to 32, the system will run 8 concurrent threads, each sending a batch of 32 prompts, resulting in a total of 256 requests processed simultaneously by the backend server. Ref: aakankshaduggal#6 instructlab#135 Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: Nikhil Palaskar <npalaska@redhat.com> Co-authored-by: shiv <shivchander.s30@gmail.com> Co-authored-by: Kai Xu <xuk@ibm.com> Co-authored-by: Aakanksha Duggal <aduggal@redhat.com>
gabe-l-hart
added a commit
to gabe-l-hart/instructlab-sdg
that referenced
this issue
Jul 19, 2024
instructlab#135 Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
gabe-l-hart
added a commit
to gabe-l-hart/instructlab-sdg
that referenced
this issue
Jul 19, 2024
instructlab#135 Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
gabe-l-hart
added a commit
to gabe-l-hart/instructlab-sdg
that referenced
this issue
Jul 19, 2024
…text This allows None to be used as a default in generate_data and from the CLI instructlab#135 Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
gabe-l-hart
added a commit
to gabe-l-hart/instructlab-sdg
that referenced
this issue
Jul 19, 2024
instructlab#135 Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
gabe-l-hart
added a commit
to gabe-l-hart/instructlab-sdg
that referenced
this issue
Jul 19, 2024
This is a mitigation to allow the `instructlab-sdg` library to merge and release before `instructlab` has updated the CLI invocation of generate_data to properly distinguish between backend types. It should be reverted once that change is made in the CLI. instructlab#135 Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
markmc
added a commit
to gabe-l-hart/instructlab
that referenced
this issue
Jul 27, 2024
Relates to instructlab/sdg#135 Since instructlab-sdg-0.1.3, data generation in batches is supported and controlled by an parameter to the `generate_data()` function. This is not supported with llama-cpp, and so we disable it in that case. Co-authored-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com>
markmc
added a commit
to gabe-l-hart/instructlab
that referenced
this issue
Jul 27, 2024
Relates to instructlab/sdg#135 Since instructlab-sdg-0.1.3, data generation in batches is supported and controlled by an parameter to the `generate_data()` function. This is not supported with llama-cpp, and so we disable it in that case. Co-authored-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com>
markmc
added a commit
to gabe-l-hart/instructlab
that referenced
this issue
Jul 27, 2024
Relates to instructlab/sdg#135 Since instructlab-sdg-0.1.3, data generation in batches is supported and controlled by an parameter to the `generate_data()` function. This is not supported with llama-cpp, and so we disable it in that case. Co-authored-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com>
markmc
added a commit
to gabe-l-hart/instructlab
that referenced
this issue
Jul 27, 2024
Relates to instructlab/sdg#135 Since instructlab-sdg-0.1.3, data generation in batches is supported and controlled by an parameter to the `generate_data()` function. This is not supported with llama-cpp, and so we disable it in that case. Co-authored-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com>
markmc
added a commit
to gabe-l-hart/instructlab
that referenced
this issue
Jul 27, 2024
Relates to instructlab/sdg#135 Since instructlab-sdg-0.1.3, data generation in batches is supported and controlled by an parameter to the `generate_data()` function. This is not supported with llama-cpp, and so we disable it in that case. Co-authored-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com>
jwm4
pushed a commit
to jwm4/sdg
that referenced
this issue
Dec 13, 2024
…_actions/DavidAnson/markdownlint-cli2-action-17.0.0 Bump DavidAnson/markdownlint-cli2-action from 16.0.0 to 17.0.0
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
From aakankshaduggal#8
Overview
The current implementation of LLMBlock sends the entire dataset as a single large batch of requests to the OpenAI server. This may lead to some requests waiting too long for the response, resulting in timeout errors and potentially overloading the backend server with extremely large batches.
Proposed Changes
Use concurrent processing using Python’s concurrent.futures package in
LLMBlock
. The key changes are:Example Usage
If the user sets the concurrency to 8 and the batch size to 32, the system will run 8 concurrent threads, each sending a batch of 32 prompts, resulting in a total of 256 requests processed simultaneously by the backend server.
ilab data generate
batching should be disabled automatically with a remove llama-cpp endpoint instructlab#1892The text was updated successfully, but these errors were encountered: