TGI fails with local LORA adapters #2253

p-davidk · 2024-07-19T00:35:23Z

Information

Docker
The CLI directly

Tasks

An officially supported command
My own modifications

Reproduction

Error overview

I am using TGI 2.1.1 via a docker container. When I try to run with local LORA adapters, the model fails to load. I am launching with the following command:

Launch command

docker run --gpus '"device=2,3"' --shm-size 1g -p 8000:80 -v /opt/:/data ghcr.io/huggingface/text-generation-inference:2.1.1 --model-id /data/Mixtral-8x7B-v0.1 --num-shard 2 --max-input-length 30000 --max-total-tokens 32000 --max-batch-total-tokens 1024000  --dtype bfloat16 --lora-adapters /data/pfizer2b-Mixtral8x7-07-16-24-1959-david-07-16-24-v1/checkpoint-1975,/data/pfizer2b-Mixtral8x7-07-16-24-1959-david-07-16-24-v1/checkpoint-1693

Error trace

When I do this, I see the following error trace:

2024-07-19T00:21:56.330730Z  INFO text_generation_launcher: Trying to load a Peft model. It might take a while without feedback
Error: DownloadError
2024-07-19T00:21:56.997596Z ERROR download: text_generation_launcher: Download encountered an error: 
Traceback (most recent call last):

  File "/opt/conda/lib/python3.10/site-packages/transformers/utils/hub.py", line 399, in cached_file
    resolved_file = hf_hub_download(

  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 106, in _inner_fn
    validate_repo_id(arg_value)

  File "/opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 154, in validate_repo_id
    raise HFValidationError(

huggingface_hub.errors.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/opt/Mixtral-8x7B-v0.1/'. Use `repo_type` argument if needed.


The above exception was the direct cause of the following exception:


Traceback (most recent call last):

  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/peft.py", line 15, in download_and_unload_peft
    model = AutoPeftModelForCausalLM.from_pretrained(

  File "/opt/conda/lib/python3.10/site-packages/peft/auto.py", line 104, in from_pretrained
    base_model = target_class.from_pretrained(base_model_path, **kwargs)

  File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 484, in from_pretrained
    resolved_config_file = cached_file(

  File "/opt/conda/lib/python3.10/site-packages/transformers/utils/hub.py", line 463, in cached_file
    raise EnvironmentError(

OSError: Incorrect path_or_model_id: '/opt/Mixtral-8x7B-v0.1/'. Please provide either the path to a local folder or the repo_id of a model on the Hub.


During handling of the above exception, another exception occurred:


Traceback (most recent call last):

  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())

  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 226, in download_weights
    utils.download_and_unload_peft(

  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/peft.py", line 23, in download_and_unload_peft
    model = AutoPeftModelForSeq2SeqLM.from_pretrained(

  File "/opt/conda/lib/python3.10/site-packages/peft/auto.py", line 88, in from_pretrained
    raise ValueError(

ValueError: Expected target PEFT class: PeftModelForCausalLM, but you have asked for: PeftModelForSeq2SeqLM make sure that you are loading the correct model for your task type.

Additional info

It appears that there are two errors here:

TGI is trying to load my local adapter from a repo, which fails
TGI thinks one of the models is Seq2Seq instead of CausalLM.

Issue (2) doesn't make sense because the configs of the LORAs and the original model all show "task_type":"CAUSAL_LM". An example config from an adapter is below:

{
  "alpha_pattern": {},
  "auto_mapping": null,
  "base_model_name_or_path": "/opt/Mixtral-8x7B-v0.1/",
  "bias": "none",
  "fan_in_fan_out": false,
  "inference_mode": true,
  "init_lora_weights": true,
  "layers_pattern": null,
  "layers_to_transform": null,
  "lora_alpha": 32,
  "lora_dropout": 0.1,
  "modules_to_save": null,
  "peft_type": "LORA",
  "r": 256,
  "rank_pattern": {},
  "revision": null,
  "target_modules": [
    "v_proj",
    "o_proj",
    "k_proj",
    "w3",
    "w2",
    "q_proj",
    "w1",
    "gate"
  ],
  "task_type": "CAUSAL_LM"
}

All configs have this same format since they are from different checkpoints of the same finetuned model.

Expected behavior

I expect the script to launch a model endpoint at port 8080. I then expect to be able to switch between adapters with the "adapter" keyword argument in the text-generation python client.

The text was updated successfully, but these errors were encountered:

ErikKaum · 2024-07-23T09:16:10Z

Hi @p-davidk 👋

Thanks for reporting this. I think we unfortunately wont be able to jump in to debug this now. If you find any more clues on what could be going on please feel free to update us here in the issue.

I'll also tag @drbh since he probably knows this part better than I do 👍

imran3180 · 2024-07-31T03:40:44Z

Support for local LORA adapters are not released in the TGI as of now. This is the Merge Request #2193 .

After this change will be released you should be able to use it as below as per description
LORA_ADAPTERS=predibase/dbpedia,myadapter=/path/to/dir/

or

--lora-adapters predibase/dbpedia,myadapter=/path/to/dir/

imran3180 · 2024-07-31T03:43:16Z

Similar Issue: Issue: #2143

ErikKaum · 2024-07-31T07:57:04Z

Thanks for adding the context @imran3180 💪

nbroad1881 · 2024-09-24T00:02:55Z

I tried this today using sha-9263817 which is > 2.3.0; still didn't work. It said, Repository Not Found for url: https://huggingface.co/api/models/data/phi3-adapter.

huggingface-cli download microsoft/Phi-3-mini-4k-instruct --local-dir phi3
huggingface-cli download grounded-ai/phi3-hallucination-judge --local-dir phi3-adapter

model=/data/phi3
adapter=/data/phi3-adapter
volume=$PWD

docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:sha-9263817 --model-id $model --lora-adapters $adapter

Running on A100 80GB

nbroad1881 · 2024-09-25T19:17:41Z

To anyone arriving here looking for a solution, here is the proper way to use local lora adapters:

LORA_ADAPTERS=myadapter=/some/path/to/adapter,myadapter2=/another/path/to/adapter

curl 127.0.0.1:3000/generate \
    -X POST \
    -H 'Content-Type: application/json' \
    -d '{
  "inputs": "Hello who are you?",
  "parameters": {
    "max_new_tokens": 40,
    "adapter_id": "myadapter"
  }
}'

imran3180 mentioned this issue Aug 15, 2024

Release 2.3.0 version #2425

Closed

Narsil mentioned this issue Sep 24, 2024

Micro cleanup. #2555

Merged

5 tasks

Narsil closed this as completed in #2555 Sep 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TGI fails with local LORA adapters #2253

TGI fails with local LORA adapters #2253

p-davidk commented Jul 19, 2024 •

edited

Loading

ErikKaum commented Jul 23, 2024

imran3180 commented Jul 31, 2024

imran3180 commented Jul 31, 2024

ErikKaum commented Jul 31, 2024

nbroad1881 commented Sep 24, 2024

nbroad1881 commented Sep 25, 2024 •

edited

Loading

TGI fails with local LORA adapters #2253

TGI fails with local LORA adapters #2253

Comments

p-davidk commented Jul 19, 2024 • edited Loading

Information

Tasks

Reproduction

Error overview

Launch command

Error trace

Additional info

Expected behavior

ErikKaum commented Jul 23, 2024

imran3180 commented Jul 31, 2024

imran3180 commented Jul 31, 2024

ErikKaum commented Jul 31, 2024

nbroad1881 commented Sep 24, 2024

nbroad1881 commented Sep 25, 2024 • edited Loading

p-davidk commented Jul 19, 2024 •

edited

Loading

nbroad1881 commented Sep 25, 2024 •

edited

Loading