Skip to content

Error running the deepspeed qlora example #1636

@AnirudhVIyer

Description

@AnirudhVIyer

System Info

Hi,
I am trying to run a qlora finetuning experiment on deepspeed similar to the one given in the sft folder.
I use the same requirements.txt as given. However, I face an issue

_

Traceback (most recent call last):
  File "/opt/ml/code/train_qlora.py", line 161, in <module>
model_args, data_args, training_args = parser.parse_args_into_dataclasses()
  File "/opt/conda/lib/python3.10/site-packages/transformers/hf_argparser.py", line 338, in parse_args_into_dataclasses
obj = dtype(**inputs)
  File "<string>", line 123, in __init__
File "/opt/conda/lib/python3.10/site-packages/transformers/training_args.py", line 1528, in __post_init__
and (self.device.type != "cuda")
  File "/opt/conda/lib/python3.10/site-packages/transformers/training_args.py", line 1995, in device
return self._setup_devices
  File "/opt/conda/lib/python3.10/site-packages/transformers/utils/generic.py", line 56, in __get__
cached = self.fget(obj)
  File "/opt/conda/lib/python3.10/site-packages/transformers/training_args.py", line 1931, in _setup_devices
self.distributed_state = PartialState(
  File "/opt/conda/lib/python3.10/site-packages/accelerate/state.py", line 180, in __init__
from deepspeed import comm as dist
ImportError: cannot import name 'comm' from 'deepspeed' (/opt/conda/lib/python3.10/site-packages/deepspeed/__init__.py)
Traceback (most recent call last):
  File "/opt/ml/code/train_qlora.py", line 161, in <module>
model_args, data_args, training_args = parser.parse_args_into_dataclasses()
  File "/opt/conda/lib/python3.10/site-packages/transformers/hf_argparser.py", line 338, in parse_args_into_dataclasses
obj = dtype(**inputs)
  File "<string>", line 123, in __init__
File "/opt/conda/lib/python3.10/site-packages/transformers/training_args.py", line 1528, in __post_init__
and (self.device.type != "cuda")
  File "/opt/conda/lib/python3.10/site-packages/transformers/training_args.py", line 1995, in device
return self._setup_devices
  File "/opt/conda/lib/python3.10/site-packages/transformers/utils/generic.py", line 56, in __get__
cached = self.fget(obj)
  File "/opt/conda/lib/python3.10/site-packages/transformers/training_args.py", line 1931, in _setup_devices
self.distributed_state = PartialState(
  File "/opt/conda/lib/python3.10/site-packages/accelerate/state.py", line 180, in __init__
from deepspeed import comm as dist
ImportError
: cannot import name 'comm' from 'deepspeed' (/opt/conda/lib/python3.10/site-packages/deepspeed/__init__.py)

_

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder
  • My own task or dataset (give details below)

Reproduction

requirements.txt:
git+https://github.com/huggingface/transformers@v4.38.2
git+https://github.com/huggingface/accelerate@v0.28.0
git+https://github.com/huggingface/peft@v0.9.0
git+https://github.com/huggingface/trl@v0.7.11
deepspeed
PyGithub
flash-attn
huggingface-hub
evaluate
datasets
bitsandbytes==0.43.0
einops
wandb
tensorboard
tiktoken
pandas
numpy
scipy
matplotlib
sentencepiece
nltk
xformers
hf_transfer

train file:
peft/examples/sft/train.py

Expected behavior

Training should run

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions