Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TGI fails with local LORA adapters #2253

Closed
2 of 4 tasks
p-davidk opened this issue Jul 19, 2024 · 6 comments · Fixed by #2555
Closed
2 of 4 tasks

TGI fails with local LORA adapters #2253

p-davidk opened this issue Jul 19, 2024 · 6 comments · Fixed by #2555

Comments

@p-davidk
Copy link

p-davidk commented Jul 19, 2024

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

Error overview

I am using TGI 2.1.1 via a docker container. When I try to run with local LORA adapters, the model fails to load. I am launching with the following command:

Launch command

docker run --gpus '"device=2,3"' --shm-size 1g -p 8000:80 -v /opt/:/data ghcr.io/huggingface/text-generation-inference:2.1.1 --model-id /data/Mixtral-8x7B-v0.1 --num-shard 2 --max-input-length 30000 --max-total-tokens 32000 --max-batch-total-tokens 1024000  --dtype bfloat16 --lora-adapters /data/pfizer2b-Mixtral8x7-07-16-24-1959-david-07-16-24-v1/checkpoint-1975,/data/pfizer2b-Mixtral8x7-07-16-24-1959-david-07-16-24-v1/checkpoint-1693

Error trace

When I do this, I see the following error trace:

2024-07-19T00:21:56.330730Z  INFO text_generation_launcher: Trying to load a Peft model. It might take a while without feedback
Error: DownloadError
2024-07-19T00:21:56.997596Z ERROR download: text_generation_launcher: Download encountered an error: 
Traceback (most recent call last):

  File "/opt/conda/lib/python3.10/site-packages/transformers/utils/hub.py", line 399, in cached_file
    resolved_file = hf_hub_download(

  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 106, in _inner_fn
    validate_repo_id(arg_value)

  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 154, in validate_repo_id
    raise HFValidationError(

huggingface_hub.errors.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/opt/Mixtral-8x7B-v0.1/'. Use `repo_type` argument if needed.


The above exception was the direct cause of the following exception:


Traceback (most recent call last):

  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/peft.py", line 15, in download_and_unload_peft
    model = AutoPeftModelForCausalLM.from_pretrained(

  File "/opt/conda/lib/python3.10/site-packages/peft/auto.py", line 104, in from_pretrained
    base_model = target_class.from_pretrained(base_model_path, **kwargs)

  File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 484, in from_pretrained
    resolved_config_file = cached_file(

  File "/opt/conda/lib/python3.10/site-packages/transformers/utils/hub.py", line 463, in cached_file
    raise EnvironmentError(

OSError: Incorrect path_or_model_id: '/opt/Mixtral-8x7B-v0.1/'. Please provide either the path to a local folder or the repo_id of a model on the Hub.


During handling of the above exception, another exception occurred:


Traceback (most recent call last):

  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())

  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 226, in download_weights
    utils.download_and_unload_peft(

  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/peft.py", line 23, in download_and_unload_peft
    model = AutoPeftModelForSeq2SeqLM.from_pretrained(

  File "/opt/conda/lib/python3.10/site-packages/peft/auto.py", line 88, in from_pretrained
    raise ValueError(

ValueError: Expected target PEFT class: PeftModelForCausalLM, but you have asked for: PeftModelForSeq2SeqLM make sure that you are loading the correct model for your task type.

Additional info

It appears that there are two errors here:

  1. TGI is trying to load my local adapter from a repo, which fails
  2. TGI thinks one of the models is Seq2Seq instead of CausalLM.

Issue (2) doesn't make sense because the configs of the LORAs and the original model all show "task_type":"CAUSAL_LM". An example config from an adapter is below:

{
  "alpha_pattern": {},
  "auto_mapping": null,
  "base_model_name_or_path": "/opt/Mixtral-8x7B-v0.1/",
  "bias": "none",
  "fan_in_fan_out": false,
  "inference_mode": true,
  "init_lora_weights": true,
  "layers_pattern": null,
  "layers_to_transform": null,
  "lora_alpha": 32,
  "lora_dropout": 0.1,
  "modules_to_save": null,
  "peft_type": "LORA",
  "r": 256,
  "rank_pattern": {},
  "revision": null,
  "target_modules": [
    "v_proj",
    "o_proj",
    "k_proj",
    "w3",
    "w2",
    "q_proj",
    "w1",
    "gate"
  ],
  "task_type": "CAUSAL_LM"
}

All configs have this same format since they are from different checkpoints of the same finetuned model.

Expected behavior

I expect the script to launch a model endpoint at port 8080. I then expect to be able to switch between adapters with the "adapter" keyword argument in the text-generation python client.

@ErikKaum
Copy link
Member

Hi @p-davidk 👋

Thanks for reporting this. I think we unfortunately wont be able to jump in to debug this now. If you find any more clues on what could be going on please feel free to update us here in the issue.

I'll also tag @drbh since he probably knows this part better than I do 👍

@imran3180
Copy link

Support for local LORA adapters are not released in the TGI as of now. This is the Merge Request #2193 .

After this change will be released you should be able to use it as below as per description
LORA_ADAPTERS=predibase/dbpedia,myadapter=/path/to/dir/

or

--lora-adapters predibase/dbpedia,myadapter=/path/to/dir/

@imran3180
Copy link

Similar Issue: Issue: #2143

@ErikKaum
Copy link
Member

Thanks for adding the context @imran3180 💪

@nbroad1881
Copy link
Contributor

I tried this today using sha-9263817 which is > 2.3.0; still didn't work. It said, Repository Not Found for url: https://huggingface.co/api/models/data/phi3-adapter.

huggingface-cli download microsoft/Phi-3-mini-4k-instruct --local-dir phi3
huggingface-cli download grounded-ai/phi3-hallucination-judge --local-dir phi3-adapter

model=/data/phi3
adapter=/data/phi3-adapter
volume=$PWD

docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:sha-9263817 --model-id $model --lora-adapters $adapter

Running on A100 80GB

@Narsil Narsil mentioned this issue Sep 24, 2024
5 tasks
@nbroad1881
Copy link
Contributor

nbroad1881 commented Sep 25, 2024

To anyone arriving here looking for a solution, here is the proper way to use local lora adapters:

LORA_ADAPTERS=myadapter=/some/path/to/adapter,myadapter2=/another/path/to/adapter
curl 127.0.0.1:3000/generate \
    -X POST \
    -H 'Content-Type: application/json' \
    -d '{
  "inputs": "Hello who are you?",
  "parameters": {
    "max_new_tokens": 40,
    "adapter_id": "myadapter"
  }
}'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants