Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model will be loaded on different devices when using multiple gpus. #67

Closed
baichuanzhou opened this issue Apr 26, 2024 · 6 comments
Closed

Comments

@baichuanzhou
Copy link
Contributor

It appears that models will be loaded on different gpus when setting num_processes to more than one, which will cause error:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

Here's my command to launch:

accelerate launch --num_processes=2 -m lmms_eval --model llava   --model_args pretrained="xxx,conv_template=xxx"   --tasks gqa,vqav2,scienceqa,textvqa --batch_size 1 --log_samples --log_samples_suffix xxx --output_path ./logs/

I found a temporary fix by installing previous version:
pip install git+https://github.com/EvolvingLMMs-Lab/lmms-eval.git@bf4c78b7e405e2ca29bf76f579371382fec3dd02
and in this version multi-gpu inference works fine.

@kcz358
Copy link
Collaborator

kcz358 commented Apr 28, 2024

May I ask in which line of inference did this error occur?

@baichuanzhou
Copy link
Contributor Author

baichuanzhou commented May 7, 2024

Sorry for the delay.

Here is one error message:

[lmms_eval/models/llava.py:386] ERROR Error Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument weight in method wrapper_CUDA__cudnn_convolution) in generating.

@kcz358
Copy link
Collaborator

kcz358 commented May 7, 2024

You might also want to try setting device_map=auto in your model_args when you do multi-processing

--model_args pretrained=xxx,conv_template=xxx,device_map=auto

@baichuanzhou
Copy link
Contributor Author

baichuanzhou commented May 7, 2024

Setting device_map to auto didn't do the trick. Here's my command:

srun -p xxx --gres=gpu:4 accelerate launch --num_processes=4 --main_process_port 19500 -m lmms_eval --model llava   --model_args pretrained="xxx,conv_template=xxx,device_map=auto"   --task textvqa_val,vizwiz_vqa_val,mmbench_en --batch_size 1 --log_samples --log_samples_suffix llava_hermes2_llama3_merged_data_v1.1_anyres_tune_vit --output_path ./logs/ #

I noticed one difference between evaluation using v0.1.2 and bf4c78b7e405e2ca29bf76f579371382fec3dd02 was this logger information:
v0.1.2:[lmms_eval/models/llava.py:124] INFO Using single device: cuda
bf4c78b7e405e2ca29bf76f579371382fec3dd02: lmms_eval/models/llava.py:104] INFO Using 4 devices with data parallelism

Line 104 appears to be here.

@kcz358
Copy link
Collaborator

kcz358 commented May 7, 2024

Sorry, my bad

Should set device_map="" when using multiprocess. Set device_map=auto only when you use num_processes=1

@baichuanzhou
Copy link
Contributor Author

Thanks. Now it works!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants