Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue when run the training script. "ValueError: You can't train a model that has been loaded with device_map='auto' in any distributed mode. Please rerun your script specifying --num_processes=1 or by launching with python {{myscript.py}}." #32

Open
litsh opened this issue Jun 27, 2024 · 0 comments

Comments

@litsh
Copy link

litsh commented Jun 27, 2024

I am running the train.sh under an environment that installed all packages by

pip install -r requirements.txt

But it gives error like below:

Traceback (most recent call last):
  File "train_huatuo.py", line 265, in <module>
    train(args)
  File "train_huatuo.py", line 145, in train
    model, optimizer, train_dataloader,  lr_scheduler = accelerator.prepare(model, optimizer, train_dataloader, lr_scheduler)
  File "/fdudata/tsli/HuatuoGPT-II/huatuo2/lib/python3.8/site-packages/accelerate/accelerator.py", line 1250, in prepare
    raise ValueError(
ValueError: You can't train a model that has been loaded with `device_map='auto'` in any distributed mode. Please rerun your script specifying `--num_processes=1` or by launching with `python {{myscript.py}}`.

And I have changed the "--num_processes" flag to 1. But it still gives the same error. Is there any suggestion for solving this problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant