Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable SDG batching with vLLM #1797

Merged
merged 4 commits into from
Jul 28, 2024
Merged

Conversation

gabe-l-hart
Copy link
Contributor

@gabe-l-hart gabe-l-hart commented Jul 18, 2024

Epic: instructlab/sdg#135
Requires: instructlab/sdg#157

When using a serving backend that supports batches of requests (i.e. vLLM), the current SDG pipeline sends the entire dataset as a single large batch of requests to the OpenAI server. This may lead to some requests waiting too long for the response, resulting in timeout errors and potentially overloading the backend server with extremely large batches.

SDG now supports launching parallel tasks to send smaller batches in parallel to the OpenAPI server. This is controlled by the batch_size argument to generate_data():

  • batch_size=None - use the library's built-in default batch size (currently 0, but will change to 8 in a future release)
  • batch_size=0 - disable batching, just use one batch
  • batch_size=<int> - use batches of the specified size

If the user sets the batch size to 8, the system will run a concurrent thread on each CPU on the system, each sending a batch of 8 prompts. So, on an 8 CPU system, this would result in a total of 64 requests processed simultaneously by the backend server.

This PR adds --batch-size to ilab data generate to allow overriding the default behaviour of a batch size of 8 with vLLM and zero with llama-cpp.

Supports instructlab/sdg#135

@gabe-l-hart gabe-l-hart marked this pull request as draft July 18, 2024 22:19
@mergify mergify bot added the ci-failure PR has at least one CI failure label Jul 18, 2024
@mergify mergify bot added ci-failure PR has at least one CI failure and removed ci-failure PR has at least one CI failure labels Jul 19, 2024
@mergify mergify bot added ci-failure PR has at least one CI failure and removed ci-failure PR has at least one CI failure labels Jul 19, 2024
@markmc markmc changed the title SDGBatchArguments Enable SDG batching with vLLM Jul 27, 2024
Copy link
Contributor

mergify bot commented Jul 27, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. @gabe-l-hart please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
@markmc markmc force-pushed the SDGBatchArguments branch from f052f4b to 973a178 Compare July 27, 2024 09:02
@mergify mergify bot added dependencies Relates to dependencies ci-failure PR has at least one CI failure and removed needs-rebase This Pull Request needs to be rebased ci-failure PR has at least one CI failure labels Jul 27, 2024
@markmc markmc force-pushed the SDGBatchArguments branch from 973a178 to 8929260 Compare July 27, 2024 09:05
@mergify mergify bot added ci-failure PR has at least one CI failure and removed ci-failure PR has at least one CI failure labels Jul 27, 2024
@markmc markmc force-pushed the SDGBatchArguments branch from 8929260 to 7b620b4 Compare July 27, 2024 09:07
@mergify mergify bot added CI/CD Affects CI/CD configuration and removed ci-failure PR has at least one CI failure labels Jul 27, 2024
@mergify mergify bot added documentation Improvements or additions to documentation and removed ci-failure PR has at least one CI failure labels Jul 27, 2024
@markmc markmc requested review from n1hility, tiran, danmcp and leseb July 27, 2024 13:11
if isinstance(backend_instance, llama_cpp_server):
batch_size = 0
logger.warning(
"Disabling SDG batching - unsupported with llama.cpp serving"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we only log this message when the user requested a particular batch-size?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, good point

@mergify mergify bot added the ci-failure PR has at least one CI failure label Jul 27, 2024
gabe-l-hart and others added 2 commits July 27, 2024 18:37
Relates to instructlab/sdg#135

Since instructlab-sdg-0.1.3, data generation in batches is supported and
controlled by an parameter to the `generate_data()` function.

This is not supported with llama-cpp, and so we disable it in that case.

Co-authored-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Now that we require a new enough version of instructlab-sdg, we
can expose this.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
@markmc markmc force-pushed the SDGBatchArguments branch from fcb84b3 to fafa8c0 Compare July 27, 2024 17:39
@markmc
Copy link
Contributor

markmc commented Jul 27, 2024

gabe-l-hart force-pushed the SDGBatchArguments branch from 7954527 to fcb84b3
3 hours ago

@gabe-l-hart was that intentional? Sorry I didn't coordinate - just looking to get this merged quickly

@markmc markmc requested a review from danmcp July 27, 2024 17:40
Also add a troubleshooting note to reference instructlab#1892 which
tracks a todo item to add some way to automatically disable
in this case.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
@markmc markmc force-pushed the SDGBatchArguments branch from fafa8c0 to 3c03122 Compare July 27, 2024 17:41
@mergify mergify bot removed the ci-failure PR has at least one CI failure label Jul 27, 2024
@gabe-l-hart
Copy link
Contributor Author

gabe-l-hart commented Jul 27, 2024

@markmc Sorry if that wasn't wanted! I got the mergebot conflict notification and wanted to clear the notification before diving into kid time and becoming unavailable for the rest of the weekend. Feel free to force push back if needed. (yikes, typing on a phone is hard!)

@mergify mergify bot added the one-approval PR has one approval from a maintainer label Jul 27, 2024
@mergify mergify bot removed the one-approval PR has one approval from a maintainer label Jul 27, 2024
@markmc markmc removed request for tiran and leseb July 28, 2024 06:07
@mergify mergify bot merged commit 2a6f213 into instructlab:main Jul 28, 2024
23 checks passed
@gabe-l-hart gabe-l-hart deleted the SDGBatchArguments branch July 29, 2024 14:46
@ktam3 ktam3 added this to the 0.18.0b3 milestone Jul 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI/CD Affects CI/CD configuration dependencies Relates to dependencies documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants