-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TGI fails with local LORA adapters #2253
Comments
Support for local LORA adapters are not released in the TGI as of now. This is the Merge Request #2193 . After this change will be released you should be able to use it as below as per description or
|
Similar Issue: Issue: #2143 |
Thanks for adding the context @imran3180 💪 |
I tried this today using huggingface-cli download microsoft/Phi-3-mini-4k-instruct --local-dir phi3
huggingface-cli download grounded-ai/phi3-hallucination-judge --local-dir phi3-adapter
model=/data/phi3
adapter=/data/phi3-adapter
volume=$PWD
docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:sha-9263817 --model-id $model --lora-adapters $adapter Running on A100 80GB |
To anyone arriving here looking for a solution, here is the proper way to use local lora adapters: LORA_ADAPTERS=myadapter=/some/path/to/adapter,myadapter2=/another/path/to/adapter curl 127.0.0.1:3000/generate \
-X POST \
-H 'Content-Type: application/json' \
-d '{
"inputs": "Hello who are you?",
"parameters": {
"max_new_tokens": 40,
"adapter_id": "myadapter"
}
}' |
Information
Tasks
Reproduction
Error overview
I am using TGI 2.1.1 via a docker container. When I try to run with local LORA adapters, the model fails to load. I am launching with the following command:
Launch command
docker run --gpus '"device=2,3"' --shm-size 1g -p 8000:80 -v /opt/:/data ghcr.io/huggingface/text-generation-inference:2.1.1 --model-id /data/Mixtral-8x7B-v0.1 --num-shard 2 --max-input-length 30000 --max-total-tokens 32000 --max-batch-total-tokens 1024000 --dtype bfloat16 --lora-adapters /data/pfizer2b-Mixtral8x7-07-16-24-1959-david-07-16-24-v1/checkpoint-1975,/data/pfizer2b-Mixtral8x7-07-16-24-1959-david-07-16-24-v1/checkpoint-1693
Error trace
When I do this, I see the following error trace:
Additional info
It appears that there are two errors here:
Issue (2) doesn't make sense because the configs of the LORAs and the original model all show
"task_type":"CAUSAL_LM"
. An example config from an adapter is below:All configs have this same format since they are from different checkpoints of the same finetuned model.
Expected behavior
I expect the script to launch a model endpoint at port 8080. I then expect to be able to switch between adapters with the "adapter" keyword argument in the
text-generation
python client.The text was updated successfully, but these errors were encountered: