Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BFCL] Fix Hanging Inference for OSS Models on GPU Platforms #663

Merged
merged 4 commits into from
Oct 5, 2024

Conversation

HuanzhiMao
Copy link
Collaborator

This PR addresses issues encountered when running locally-hosted models on GPU-renting platforms (e.g., Lambda Cloud). Specifically, there were problems with output display from vllm due to the use of subprocesses for launching these models. Additionally, some multi-turn functions (such as xargs) rely on subprocesses, which caused inference on certain test entries (such as multi_turn_36 ) to hang indefinitely, resulting in an undesirable pipeline halt.

To fix this, the terminal logging logic has been updated to utilize a separate thread for reading from the subprocess pipe and printing to the terminal.

Alos, for readability, the _format_prompt function has been moved to the Prompting methods section; this would not change the leaderboard score.

@HuanzhiMao HuanzhiMao added the BFCL-General General BFCL Issue label Sep 27, 2024
@HuanzhiMao HuanzhiMao changed the title [BFCL] [BFCL] Fix Hanging Inference for OSS Models on GPU Platforms Sep 27, 2024
Copy link
Collaborator

@CharlieJCJ CharlieJCJ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ShishirPatil ShishirPatil merged commit ff169f5 into ShishirPatil:main Oct 5, 2024
@HuanzhiMao HuanzhiMao deleted the subprocess branch October 5, 2024 05:55
VishnuSuresh27 pushed a commit to VishnuSuresh27/gorilla that referenced this pull request Nov 11, 2024
…Patil#663)

This PR addresses issues encountered when running locally-hosted models
on GPU-renting platforms (e.g., Lambda Cloud). Specifically, there were
problems with output display from `vllm` due to the use of subprocesses
for launching these models. Additionally, some multi-turn functions
(such as `xargs`) rely on subprocesses, which caused inference on
certain test entries (such as `multi_turn_36 `) to hang indefinitely,
resulting in an undesirable pipeline halt.

To fix this, the terminal logging logic has been updated to utilize a
separate thread for reading from the subprocess pipe and printing to the
terminal.

Alos, for readability, the `_format_prompt` function has been moved to
the `Prompting methods` section; this would not change the leaderboard
score.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BFCL-General General BFCL Issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants