Help with : LoRA issue in distributed setting

### System Info

Hello there, i'm trying to follow [this tuto](https://huggingface.co/docs/peft/main/en/accelerate/deepspeed#deepspeed) from the documentation in order to finetune a model in a distributed setting (currently testing with a 7B model), I'm doing the training in huggingface's spaces using a [Jupyter docker image](https://huggingface.co/new-space?template=SpacesExamples%2Fjupyterlab) with 4 L4 GPUs (using terminal not notebook)

![image](https://github.com/huggingface/peft/assets/58257628/ca221013-752d-4192-a75f-1deb49ee2e81)

The error is simpy `ModuleNotFoundError: No module named 'torch._six'`

### Who can help?

@pacman100 and @stevhliu 

### Information

- [X] The official example scripts
- [ ] My own modified scripts

### Tasks

- [X] An officially supported task in the `examples` folder
- [ ] My own task or dataset (give details below)

### Reproduction

My script so far :
```
git clone https://github.com/huggingface/peft.git
cd peft

# trying to make sure everything is installed
pip install -r requirements.txt
pip install -r examples/sft/requirements_colab.txt
pip install -r examples/sft/requirements.txt

accelerate config --config_file deepspeed_config.yaml

accelerate launch --config_file "deepspeed_config.yaml"  examples/sft/train.py \
--seed 100 \
--model_name_or_path "meta-llama/Llama-2-7b-hf" \
--dataset_name "AbderrahmanSkiredj1/moroccan_darija_wikipedia_dataset" \
--chat_template_format "none" \
--add_special_tokens False \
--append_concat_token False \
--splits "train,test" \
--max_seq_len 2048 \
--num_train_epochs 1 \
--logging_steps 5 \
--log_level "info" \
--logging_strategy "steps" \
--evaluation_strategy "epoch" \
--save_strategy "epoch" \
--push_to_hub \
--hub_private_repo True \
--hub_strategy "every_save" \
--bf16 True \
--packing True \
--learning_rate 1e-4 \
--lr_scheduler_type "cosine" \
--weight_decay 1e-4 \
--warmup_ratio 0.0 \
--max_grad_norm 1.0 \
--output_dir "llama2-7b-wiki-ary-sft-lora-deepspeed" \
--per_device_train_batch_size 8 \
--per_device_eval_batch_size 8 \
--gradient_accumulation_steps 4 \
--gradient_checkpointing True \
--use_reentrant False \
--dataset_text_field "content" \
--use_flash_attn True \
--use_peft_lora True \
--lora_r 8 \
--lora_alpha 16 \
--lora_dropout 0.1 \
--lora_target_modules "all-linear" \
--use_4bit_quantization False
```

### Expected behavior

to finish training and push the adapter to the hub !?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Help with : LoRA issue in distributed setting #1794

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Help with : LoRA issue in distributed setting #1794

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions