-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Usage]: guided_regex in offline model #8907
Comments
Note that I have tried:
then:
which gives:
I suspect, if this worked, it wouldn't necessarily dispatch to both GPUs, but I'm uncertain. I'm also unclear on where I would need to pass in device_map="auto". I see there is a separate way to load as follows:
but this gives the following error:
btw, running
and I'm in kaggle with 2xT4 (I want to run this llama 3.1 8B model on two gpus). |
@RonanKMcGovern this will be addressed by #8252 which will be merged very soon. |
Thanks @njhill that would be absolutely excellent! |
btw, @njhill - in the meantime - can you recommend a way to install your branch+repo? I'm trying git clone and
|
@RonanKMcGovern this section of docs might be helpful: https://docs.vllm.ai/en/latest/getting_started/installation.html#build-from-source-without-compilation The TLDR is you can copy in python code changes from a branch after installing the latest published wheel |
Closing this now that #8252 is merged/released. Feel free to open a new issue as needed! |
Your current environment
N/A
How would you like to use vllm
I want to use guided_regex but in offline mode.
I had thought maybe I could pass guided_regex via sampling_params or when calling llm.chat() or llm.generate() but I don't see where that can be done.
I know I can do this via a server, but I'd like to do it locally instead in a notebook, without a server. Thanks
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: