-
Notifications
You must be signed in to change notification settings - Fork 867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add single command LLM deployment #3209
Conversation
frontend/server/src/main/java/org/pytorch/serve/wlm/AsyncWorkerThread.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Had one comment on the token. Please check and change if you think it makes sense
|
||
You can then go ahead and launch a TorchServe instance serving your selected model: | ||
```bash | ||
docker run --rm -ti --gpus all -e HUGGING_FACE_HUB_TOKEN=$token -p 8080:8080 -v data:/data ts/llm --model_id meta-llama/Meta-Llama-3-8B-Instruct --disable_token |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed -e HUGGING_FACE_HUB_TOKEN=$token
Why not directly set export HUGGING_FACE_HUB_TOKEN= <HUGGINGFACE_HUB_TOKEN>
Also from the security POV, does the existing command print the token?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK the docker will not pick up env variables from the calling environment. So you would still have
export HUGGING_FACE_HUB_TOKEN= <HUGGINGFACE_HUB_TOKEN>
docker run --rm -ti --gpus all -e HUGGING_FACE_HUB_TOKEN=$HUGGING_FACE_HUB_TOKEN ...
which is even longer and it comes down to the same process. For a RL deployment the token variable would be set through a secret and so the token would not show up in any of the logs.
Description
This PR adds a feature to TorchServe to deploy LLM with a single command.
Is uses a new ts.launcher interface to start and stop torchserve and adds ts.llm_launcher to launch an LLM model given its hugginface hub identifier using our new vllm integration.
It also adds a docker image based on our gpu image to easily run this without installing torchserve.
Type of change
Please delete options that are not relevant.
Feature/Issue validation/testing
Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.
Logs for Test A
Checklist: