-
Notifications
You must be signed in to change notification settings - Fork 400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable SDG batching with vLLM #1797
Conversation
e9d0d02
to
bda25e9
Compare
bda25e9
to
f052f4b
Compare
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
f052f4b
to
973a178
Compare
973a178
to
8929260
Compare
8929260
to
7b620b4
Compare
src/instructlab/data/generate.py
Outdated
if isinstance(backend_instance, llama_cpp_server): | ||
batch_size = 0 | ||
logger.warning( | ||
"Disabling SDG batching - unsupported with llama.cpp serving" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we only log this message when the user requested a particular batch-size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, good point
7954527
to
fcb84b3
Compare
Relates to instructlab/sdg#135 Since instructlab-sdg-0.1.3, data generation in batches is supported and controlled by an parameter to the `generate_data()` function. This is not supported with llama-cpp, and so we disable it in that case. Co-authored-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Now that we require a new enough version of instructlab-sdg, we can expose this. Signed-off-by: Mark McLoughlin <markmc@redhat.com>
fcb84b3
to
fafa8c0
Compare
@gabe-l-hart was that intentional? Sorry I didn't coordinate - just looking to get this merged quickly |
Also add a troubleshooting note to reference instructlab#1892 which tracks a todo item to add some way to automatically disable in this case. Signed-off-by: Mark McLoughlin <markmc@redhat.com>
fafa8c0
to
3c03122
Compare
@markmc Sorry if that wasn't wanted! I got the mergebot conflict notification and wanted to clear the notification before diving into kid time and becoming unavailable for the rest of the weekend. Feel free to force push back if needed. (yikes, typing on a phone is hard!) |
Epic: instructlab/sdg#135
Requires: instructlab/sdg#157
When using a serving backend that supports batches of requests (i.e. vLLM), the current SDG pipeline sends the entire dataset as a single large batch of requests to the OpenAI server. This may lead to some requests waiting too long for the response, resulting in timeout errors and potentially overloading the backend server with extremely large batches.
SDG now supports launching parallel tasks to send smaller batches in parallel to the OpenAPI server. This is controlled by the
batch_size
argument togenerate_data()
:batch_size=None
- use the library's built-in default batch size (currently 0, but will change to 8 in a future release)batch_size=0
- disable batching, just use one batchbatch_size=<int>
- use batches of the specified sizeIf the user sets the batch size to 8, the system will run a concurrent thread on each CPU on the system, each sending a batch of 8 prompts. So, on an 8 CPU system, this would result in a total of 64 requests processed simultaneously by the backend server.
This PR adds
--batch-size
to ilab data generate to allow overriding the default behaviour of a batch size of 8 with vLLM and zero with llama-cpp.Supports instructlab/sdg#135