-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Closed
Description
System Info
Hi,
I am trying to run a qlora finetuning experiment on deepspeed similar to the one given in the sft folder.
I use the same requirements.txt as given. However, I face an issue
_
Traceback (most recent call last):
File "/opt/ml/code/train_qlora.py", line 161, in <module>
model_args, data_args, training_args = parser.parse_args_into_dataclasses()
File "/opt/conda/lib/python3.10/site-packages/transformers/hf_argparser.py", line 338, in parse_args_into_dataclasses
obj = dtype(**inputs)
File "<string>", line 123, in __init__
File "/opt/conda/lib/python3.10/site-packages/transformers/training_args.py", line 1528, in __post_init__
and (self.device.type != "cuda")
File "/opt/conda/lib/python3.10/site-packages/transformers/training_args.py", line 1995, in device
return self._setup_devices
File "/opt/conda/lib/python3.10/site-packages/transformers/utils/generic.py", line 56, in __get__
cached = self.fget(obj)
File "/opt/conda/lib/python3.10/site-packages/transformers/training_args.py", line 1931, in _setup_devices
self.distributed_state = PartialState(
File "/opt/conda/lib/python3.10/site-packages/accelerate/state.py", line 180, in __init__
from deepspeed import comm as dist
ImportError: cannot import name 'comm' from 'deepspeed' (/opt/conda/lib/python3.10/site-packages/deepspeed/__init__.py)
Traceback (most recent call last):
File "/opt/ml/code/train_qlora.py", line 161, in <module>
model_args, data_args, training_args = parser.parse_args_into_dataclasses()
File "/opt/conda/lib/python3.10/site-packages/transformers/hf_argparser.py", line 338, in parse_args_into_dataclasses
obj = dtype(**inputs)
File "<string>", line 123, in __init__
File "/opt/conda/lib/python3.10/site-packages/transformers/training_args.py", line 1528, in __post_init__
and (self.device.type != "cuda")
File "/opt/conda/lib/python3.10/site-packages/transformers/training_args.py", line 1995, in device
return self._setup_devices
File "/opt/conda/lib/python3.10/site-packages/transformers/utils/generic.py", line 56, in __get__
cached = self.fget(obj)
File "/opt/conda/lib/python3.10/site-packages/transformers/training_args.py", line 1931, in _setup_devices
self.distributed_state = PartialState(
File "/opt/conda/lib/python3.10/site-packages/accelerate/state.py", line 180, in __init__
from deepspeed import comm as dist
ImportError
: cannot import name 'comm' from 'deepspeed' (/opt/conda/lib/python3.10/site-packages/deepspeed/__init__.py)
_
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder - My own task or dataset (give details below)
Reproduction
requirements.txt:
git+https://github.com/huggingface/transformers@v4.38.2
git+https://github.com/huggingface/accelerate@v0.28.0
git+https://github.com/huggingface/peft@v0.9.0
git+https://github.com/huggingface/trl@v0.7.11
deepspeed
PyGithub
flash-attn
huggingface-hub
evaluate
datasets
bitsandbytes==0.43.0
einops
wandb
tensorboard
tiktoken
pandas
numpy
scipy
matplotlib
sentencepiece
nltk
xformers
hf_transfer
train file:
peft/examples/sft/train.py
Expected behavior
Training should run
Metadata
Metadata
Assignees
Labels
No labels