Skip to content

Help with : LoRA issue in distributed setting #1794

@alielfilali01

Description

@alielfilali01

System Info

Hello there, i'm trying to follow this tuto from the documentation in order to finetune a model in a distributed setting (currently testing with a 7B model), I'm doing the training in huggingface's spaces using a Jupyter docker image with 4 L4 GPUs (using terminal not notebook)

image

The error is simpy ModuleNotFoundError: No module named 'torch._six'

Who can help?

@pacman100 and @stevhliu

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder
  • My own task or dataset (give details below)

Reproduction

My script so far :

git clone https://github.com/huggingface/peft.git
cd peft

# trying to make sure everything is installed
pip install -r requirements.txt
pip install -r examples/sft/requirements_colab.txt
pip install -r examples/sft/requirements.txt

accelerate config --config_file deepspeed_config.yaml

accelerate launch --config_file "deepspeed_config.yaml"  examples/sft/train.py \
--seed 100 \
--model_name_or_path "meta-llama/Llama-2-7b-hf" \
--dataset_name "AbderrahmanSkiredj1/moroccan_darija_wikipedia_dataset" \
--chat_template_format "none" \
--add_special_tokens False \
--append_concat_token False \
--splits "train,test" \
--max_seq_len 2048 \
--num_train_epochs 1 \
--logging_steps 5 \
--log_level "info" \
--logging_strategy "steps" \
--evaluation_strategy "epoch" \
--save_strategy "epoch" \
--push_to_hub \
--hub_private_repo True \
--hub_strategy "every_save" \
--bf16 True \
--packing True \
--learning_rate 1e-4 \
--lr_scheduler_type "cosine" \
--weight_decay 1e-4 \
--warmup_ratio 0.0 \
--max_grad_norm 1.0 \
--output_dir "llama2-7b-wiki-ary-sft-lora-deepspeed" \
--per_device_train_batch_size 8 \
--per_device_eval_batch_size 8 \
--gradient_accumulation_steps 4 \
--gradient_checkpointing True \
--use_reentrant False \
--dataset_text_field "content" \
--use_flash_attn True \
--use_peft_lora True \
--lora_r 8 \
--lora_alpha 16 \
--lora_dropout 0.1 \
--lora_target_modules "all-linear" \
--use_4bit_quantization False

Expected behavior

to finish training and push the adapter to the hub !?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions