PEFT version conflict #13

ignaciocearuiz · 2024-12-08T00:30:49Z

Hi! I'm facing library conflicts while trying to fine-tune and generate sequences using this repository. Here's a breakdown of my setup and the issue:

Setup

Environment: Google Colab Pro (NVIDIA A100 40GB GPU)
Steps I followed:
1. Cloned the repo and ran pip install -r requirements.txt.
2. Replaced example.json with my train_split.json file (shown below) in the instruction_tuning_dataset folder.
3. Installed missing dependencies (deepspeed and datasets) manually.
4. Successfully executed run_it.sh, saving the fine-tuned model in the save_dir folder.
5. Ran the inference script with:
```
CUDA_VISIBLE_DEVICES=0 python ProLLaMA/scripts/infer.py --model "save_dir/sft_lora_model/" --interactive
```

train_split.json Example

[
    {
        "instruction": "[Generate by protein family]",
        "input": "family=<Zinc Fingers family>",
        "output": "Seq=<MSENSDEG...>"
    },
    {
        "instruction": "[Generate by protein family]",
        "input": "family=<Zinc Fingers family>",
        "output": "Seq=<MRHNQAKSLAQ...>"
    }
]

Error Encountered

While running the inference script, I encountered the following error:

Traceback (most recent call last):
  File "/content/ProLLaMA/scripts/infer.py", line 42, in <module>
    model = LlamaForCausalLM.from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3553, in from_pretrained
    model.load_adapter(
  File "/usr/local/lib/python3.10/dist-packages/transformers/integrations/peft.py", line 137, in load_adapter
    check_peft_version(min_version=MIN_PEFT_VERSION)
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/peft_utils.py", line 120, in check_peft_version
    raise ValueError(
ValueError: The version of PEFT you are using is not compatible, please use a version that is greater than 0.5.0

Issue Description

Installing an updated version of PEFT to resolve the error creates compatibility issues with huggingface_hub. I’m unsure how to resolve this version conflict without breaking other dependencies.

Can someone provide guidance on how to proceed? Thanks in advance!

The text was updated successfully, but these errors were encountered:

Lyu6PosHao · 2024-12-09T06:32:58Z

Hello!
I run the infer.py in my own environment sucessfully. My environment:
transformers==4.43.1
peft==0.13.1
torch==2.5.1

So if you only need to infer, you don't have to strictly follow requirements.txt. It is flexible.

ignaciocearuiz · 2024-12-28T20:32:52Z

Hi! Thanks for your response :)

Those versions work fine when running inference on the base model. The conflicts I found arise when trying to do it after fine-tuning on a user-defined set of instructions. I tried fine-tuning with the requirements.txt libraries versions, then installing those other versions of the libraries you mentioned, then running the infer.py script and got, again, an error related to PEFT:

Loading checkpoint shards: 100% 2/2 [00:09<00:00,  4.70s/it]
Traceback (most recent call last):
  File "/content/ProLLaMA/scripts/infer.py", line 42, in <module>
    model = LlamaForCausalLM.from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3984, in from_pretrained
    model.load_adapter(
  File "/usr/local/lib/python3.10/dist-packages/transformers/integrations/peft.py", line 142, in load_adapter
    from peft import PeftConfig, inject_adapter_in_model, load_peft_weights
ImportError: cannot import name 'inject_adapter_in_model' from 'peft' (/content/ProLLaMA/scripts/peft/__init__.py)

Any guidance on how to proceed would be much appreciated, as I think it could benefit us all who want to experiment with ProLLaMA 🙌🏻

Lyu6PosHao · 2024-12-31T10:30:43Z

Thanks for you suggestion and debugging! I will fix it soon.

Lyu6PosHao · 2025-01-07T15:53:52Z

Hello!

I have updated the codes for easier usage. You could check the README.md for what is changed.

I run the run_it.sh on a toy dataset in my python environment sucessfully:

# the codes are based on Chinese-LLaMA-Alpaca-2
# Read the wiki(https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/sft_scripts_zh) carefully before running the script
export CUDA_VISIBLE_DEVICES=3
export WANDB_PROJECT="instruction_tuning"
lr=5e-5
lora_rank=64
lora_alpha=128
lora_trainable="q_proj,v_proj,k_proj,o_proj,gate_proj,down_proj,up_proj"
lora_dropout=0.05

pretrained_model=/path/to/ProLLaMA_Stage_1 #or your local path
dataset_dir=./instruction_tuning_dataset #your dataset path
per_device_train_batch_size=144
gradient_accumulation_steps=4
max_seq_length=256
output_dir=save_dir/
deepspeed_config_file=ds_zero2_no_offload.json
torchrun  --nproc_per_node 1 instruction_tune.py \
    --deepspeed ${deepspeed_config_file} \
    --model_name_or_path ${pretrained_model} \
    --tokenizer_name_or_path ${pretrained_model} \
    --dataset_dir ${dataset_dir} \
    --per_device_train_batch_size ${per_device_train_batch_size} \
    --do_train \
    --seed 42 \
    --bf16 \
    --num_train_epochs 2 \
    --lr_scheduler_type cosine \
    --learning_rate ${lr} \
    --warmup_ratio 0.03 \
    --weight_decay 0 \
    --logging_strategy steps \
    --logging_steps 2 \
    --save_strategy steps \
    --save_total_limit 3 \
    --save_steps 1000 \
    --gradient_accumulation_steps ${gradient_accumulation_steps} \
    --preprocessing_num_workers 32 \
    --max_seq_length ${max_seq_length} \
    --output_dir ${output_dir} \
    --ddp_timeout 30000 \
    --logging_first_step True \
    --lora_rank ${lora_rank} \
    --lora_alpha ${lora_alpha} \
    --trainable ${lora_trainable} \
    --lora_dropout ${lora_dropout} \
    --torch_dtype float16 \
    --load_in_kbits 16 \
    --save_safetensors False \
    --ddp_find_unused_parameters False \
    --gradient_checkpointing \
    --merge_when_finished True\
    #--resume_from_checkpoint path_to_checkpoint \
    #--use_flash_attention_2 \

After training, I get a model in ./save_dir_merged/. And I run the below succefully:

CUDA_VISIBLE_DEVICES=0 python infer.py  --model ./save_dir_merged/ --interactive

I just use the newest pip package of tranformers, peft, etc. I think the package version is not strict. My environment is:

transformers             4.47.1
torch                    2.5.1+cu124
sentencepiece            0.2.0
peft                     0.14.0
deepspeed                0.16.2

Please let me know if you have any questions.

Best regards

ignaciocearuiz · 2025-01-10T16:51:00Z

Hello again and thank you so much, the new requirements.txt file that you updated fixed the conflict!

Just four comments before closing this issue:

I had to use per_device_train_batch_size=72 and gradient_accumulation_steps=8 for the script to run seamlessly in one NVIDIA A100 GPU. Is there some rule of thumb for selecting those parameters efficiently? What's your opinion about them?
I got these stats in both train epochs:
```
{'loss': 4.3557, 'learning_rate': 0, 'epoch': 1.0}
{'loss': 4.3557, 'learning_rate': 0, 'epoch': 2.0}
```
I think that the learning rate zero must be a bug, I'm not so sure about the repeated loss value. What do you think?
When running the infer.py script, I had to type Seq=< at the end of each prompt in order for it to effectively generate a sequence.
I would like to make one suggestion: you could add the datasets library to the requirements too, so it won't be necessary to install it manually.

Best regards!

Lyu6PosHao · 2025-01-23T09:51:47Z

Thanks for your valuable suggestions! Below are my suggestions:

I think the batch_size should be adjusted first until the GPU memory is filled. If the batch_size is not too small at this time, there is no need to turn on gradient_accumulation. So I may suggest per_device_train_batch_size=72 and gradient_accumulation_steps=1.
I will check if this is a bug caused by my code.
You could manually modify this line of code

ProLLaMA/scripts/infer.py

Line 58 in 8c558b6

input_text = raw_input_text

to:

input_text="Seq=<"+raw_input_text

Then you don't have to type it per time.
4. I will add it into requirements.

Sorry for the late reply.

Best regards!

ignaciocearuiz · 2025-01-28T15:22:10Z

Hello @Lyu6PosHao ! Thank you for your suggestions :)

I have one more question. I saw that, when executing fine-tuning, the script doesn't make use of a validation split, it just uses every JSON file in the instruction_tuning_dataset folder as training data. Would you recommend using a validation split and, if so, how should I implement it?

Lyu6PosHao closed this as completed Dec 20, 2024

Lyu6PosHao reopened this Dec 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PEFT version conflict #13

PEFT version conflict #13

ignaciocearuiz commented Dec 8, 2024

Lyu6PosHao commented Dec 9, 2024

ignaciocearuiz commented Dec 28, 2024

Lyu6PosHao commented Dec 31, 2024

Lyu6PosHao commented Jan 7, 2025

ignaciocearuiz commented Jan 10, 2025

Lyu6PosHao commented Jan 23, 2025

ignaciocearuiz commented Jan 28, 2025

PEFT version conflict #13

PEFT version conflict #13

Comments

ignaciocearuiz commented Dec 8, 2024

Setup

train_split.json Example

Error Encountered

Issue Description

Lyu6PosHao commented Dec 9, 2024

ignaciocearuiz commented Dec 28, 2024

Lyu6PosHao commented Dec 31, 2024

Lyu6PosHao commented Jan 7, 2025

ignaciocearuiz commented Jan 10, 2025

Lyu6PosHao commented Jan 23, 2025

ignaciocearuiz commented Jan 28, 2025