You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"home/anaconda3/envs/consistency/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 2433, in all_gather
work = default_pg.allgather([tensor_list], [tensor])
torch.distributed.DistBackendError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1275, internal error, NCCL version 2.14.3
ncclInternalError: Internal check failed.
Last error:
Cuda failure 'peer access is not supported between these two devices'
The text was updated successfully, but these errors were encountered:
Traceback (most recent call last):
File "scripts/image_sample.py", line 143, in
main()
File "scripts/image_sample.py", line 91, in main dist.all_gather(gathered_samples, sample) # gather not supported with NCCL
fikry102
changed the title
multi-gpu error
“multi-gpu error” dist.all_gather(gathered_samples, sample) # gather not supported with NCCL
Oct 3, 2023
mpiexec -n 8 python scripts/image_sample.py --batch_size 32 --training_mode consistency_distillation --sampler multistep --ts 0,62,150 --steps 151 --model_path ./ct_cat256.pt --attention_resolutions 32,16,8 --class_cond False --use_scale_shift_norm False --dropout 0.0 --image_size 256 --num_channels 256 --num_head_channels 64 --num_res_blocks 2 --num_samples 500 --resblock_updown True --use_fp16 True --weight_schedule uniform
The text was updated successfully, but these errors were encountered: