Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEFT version conflict #13

Open
ignaciocearuiz opened this issue Dec 8, 2024 · 7 comments
Open

PEFT version conflict #13

ignaciocearuiz opened this issue Dec 8, 2024 · 7 comments

Comments

@ignaciocearuiz
Copy link

Hi! I'm facing library conflicts while trying to fine-tune and generate sequences using this repository. Here's a breakdown of my setup and the issue:

Setup

  • Environment: Google Colab Pro (NVIDIA A100 40GB GPU)
  • Steps I followed:
    1. Cloned the repo and ran pip install -r requirements.txt.
    2. Replaced example.json with my train_split.json file (shown below) in the instruction_tuning_dataset folder.
    3. Installed missing dependencies (deepspeed and datasets) manually.
    4. Successfully executed run_it.sh, saving the fine-tuned model in the save_dir folder.
    5. Ran the inference script with:
      CUDA_VISIBLE_DEVICES=0 python ProLLaMA/scripts/infer.py --model "save_dir/sft_lora_model/" --interactive

train_split.json Example

[
    {
        "instruction": "[Generate by protein family]",
        "input": "family=<Zinc Fingers family>",
        "output": "Seq=<MSENSDEG...>"
    },
    {
        "instruction": "[Generate by protein family]",
        "input": "family=<Zinc Fingers family>",
        "output": "Seq=<MRHNQAKSLAQ...>"
    }
]

Error Encountered

While running the inference script, I encountered the following error:

Traceback (most recent call last):
  File "/content/ProLLaMA/scripts/infer.py", line 42, in <module>
    model = LlamaForCausalLM.from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3553, in from_pretrained
    model.load_adapter(
  File "/usr/local/lib/python3.10/dist-packages/transformers/integrations/peft.py", line 137, in load_adapter
    check_peft_version(min_version=MIN_PEFT_VERSION)
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/peft_utils.py", line 120, in check_peft_version
    raise ValueError(
ValueError: The version of PEFT you are using is not compatible, please use a version that is greater than 0.5.0

Issue Description

Installing an updated version of PEFT to resolve the error creates compatibility issues with huggingface_hub. I’m unsure how to resolve this version conflict without breaking other dependencies.

Can someone provide guidance on how to proceed? Thanks in advance!

@Lyu6PosHao
Copy link
Member

Hello!
I run the infer.py in my own environment sucessfully. My environment:
transformers==4.43.1
peft==0.13.1
torch==2.5.1

So if you only need to infer, you don't have to strictly follow requirements.txt. It is flexible.

@ignaciocearuiz
Copy link
Author

Hi! Thanks for your response :)

Those versions work fine when running inference on the base model. The conflicts I found arise when trying to do it after fine-tuning on a user-defined set of instructions. I tried fine-tuning with the requirements.txt libraries versions, then installing those other versions of the libraries you mentioned, then running the infer.py script and got, again, an error related to PEFT:

Loading checkpoint shards: 100% 2/2 [00:09<00:00,  4.70s/it]
Traceback (most recent call last):
  File "/content/ProLLaMA/scripts/infer.py", line 42, in <module>
    model = LlamaForCausalLM.from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3984, in from_pretrained
    model.load_adapter(
  File "/usr/local/lib/python3.10/dist-packages/transformers/integrations/peft.py", line 142, in load_adapter
    from peft import PeftConfig, inject_adapter_in_model, load_peft_weights
ImportError: cannot import name 'inject_adapter_in_model' from 'peft' (/content/ProLLaMA/scripts/peft/__init__.py)

Any guidance on how to proceed would be much appreciated, as I think it could benefit us all who want to experiment with ProLLaMA 🙌🏻

@Lyu6PosHao Lyu6PosHao reopened this Dec 30, 2024
@Lyu6PosHao
Copy link
Member

Thanks for you suggestion and debugging! I will fix it soon.

@Lyu6PosHao
Copy link
Member

Hello!

I have updated the codes for easier usage. You could check the README.md for what is changed.

I run the run_it.sh on a toy dataset in my python environment sucessfully:

# the codes are based on Chinese-LLaMA-Alpaca-2
# Read the wiki(https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/sft_scripts_zh) carefully before running the script
export CUDA_VISIBLE_DEVICES=3
export WANDB_PROJECT="instruction_tuning"
lr=5e-5
lora_rank=64
lora_alpha=128
lora_trainable="q_proj,v_proj,k_proj,o_proj,gate_proj,down_proj,up_proj"
lora_dropout=0.05

pretrained_model=/path/to/ProLLaMA_Stage_1 #or your local path
dataset_dir=./instruction_tuning_dataset #your dataset path
per_device_train_batch_size=144
gradient_accumulation_steps=4
max_seq_length=256
output_dir=save_dir/
deepspeed_config_file=ds_zero2_no_offload.json
torchrun  --nproc_per_node 1 instruction_tune.py \
    --deepspeed ${deepspeed_config_file} \
    --model_name_or_path ${pretrained_model} \
    --tokenizer_name_or_path ${pretrained_model} \
    --dataset_dir ${dataset_dir} \
    --per_device_train_batch_size ${per_device_train_batch_size} \
    --do_train \
    --seed 42 \
    --bf16 \
    --num_train_epochs 2 \
    --lr_scheduler_type cosine \
    --learning_rate ${lr} \
    --warmup_ratio 0.03 \
    --weight_decay 0 \
    --logging_strategy steps \
    --logging_steps 2 \
    --save_strategy steps \
    --save_total_limit 3 \
    --save_steps 1000 \
    --gradient_accumulation_steps ${gradient_accumulation_steps} \
    --preprocessing_num_workers 32 \
    --max_seq_length ${max_seq_length} \
    --output_dir ${output_dir} \
    --ddp_timeout 30000 \
    --logging_first_step True \
    --lora_rank ${lora_rank} \
    --lora_alpha ${lora_alpha} \
    --trainable ${lora_trainable} \
    --lora_dropout ${lora_dropout} \
    --torch_dtype float16 \
    --load_in_kbits 16 \
    --save_safetensors False \
    --ddp_find_unused_parameters False \
    --gradient_checkpointing \
    --merge_when_finished True\
    #--resume_from_checkpoint path_to_checkpoint \
    #--use_flash_attention_2 \

After training, I get a model in ./save_dir_merged/. And I run the below succefully:

CUDA_VISIBLE_DEVICES=0 python infer.py  --model ./save_dir_merged/ --interactive

I just use the newest pip package of tranformers, peft, etc. I think the package version is not strict. My environment is:

transformers             4.47.1
torch                    2.5.1+cu124
sentencepiece            0.2.0
peft                     0.14.0
deepspeed                0.16.2

Please let me know if you have any questions.

Best regards

@ignaciocearuiz
Copy link
Author

Hello again and thank you so much, the new requirements.txt file that you updated fixed the conflict!

Just four comments before closing this issue:

  1. I had to use per_device_train_batch_size=72 and gradient_accumulation_steps=8 for the script to run seamlessly in one NVIDIA A100 GPU. Is there some rule of thumb for selecting those parameters efficiently? What's your opinion about them?
  2. I got these stats in both train epochs:
    {'loss': 4.3557, 'learning_rate': 0, 'epoch': 1.0}
    {'loss': 4.3557, 'learning_rate': 0, 'epoch': 2.0}
    
    I think that the learning rate zero must be a bug, I'm not so sure about the repeated loss value. What do you think?
  3. When running the infer.py script, I had to type Seq=< at the end of each prompt in order for it to effectively generate a sequence.
  4. I would like to make one suggestion: you could add the datasets library to the requirements too, so it won't be necessary to install it manually.

Best regards!

@Lyu6PosHao
Copy link
Member

Thanks for your valuable suggestions! Below are my suggestions:

  1. I think the batch_size should be adjusted first until the GPU memory is filled. If the batch_size is not too small at this time, there is no need to turn on gradient_accumulation. So I may suggest per_device_train_batch_size=72 and gradient_accumulation_steps=1.
  2. I will check if this is a bug caused by my code.
  3. You could manually modify this line of code
    input_text = raw_input_text

    to:
input_text="Seq=<"+raw_input_text

Then you don't have to type it per time.
4. I will add it into requirements.

Sorry for the late reply.

Best regards!

@ignaciocearuiz
Copy link
Author

Hello @Lyu6PosHao ! Thank you for your suggestions :)

I have one more question. I saw that, when executing fine-tuning, the script doesn't make use of a validation split, it just uses every JSON file in the instruction_tuning_dataset folder as training data. Would you recommend using a validation split and, if so, how should I implement it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants