You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I train on two gpus(1080TI *2), it is current.
the configuration is CUDA 11.1, pythorch 1.8.1, torchvision 0.9.1, python 3.8.3
Warning: multi_tensor_applier fused unscale kernel is unavailable, possibly because apex was installed without --cuda_ext --cpp_ext. Using Python fallback. Original ImportError was: ModuleNotFoundError("No module named 'amp_C'")
Warning: apex was installed without --cpp_ext. Falling back to Python flatten and unflatten.
Training (X / X Steps) (loss=X.X): 0%|| 0/749 [00:00<?, ?it/s]Warning: apex was installed without --cpp_ext. Falling back to Python flatten and unflatten.
Training (X / X Steps) (loss=X.X): 0%|| 0/749 [00:42<?, ?it/s]
Traceback (most recent call last):
File "train.py", line 400, in <module>
main()
File "train.py", line 397, in main
train(args, model)
File "train.py", line 226, in train
loss, logits = model(x, y)
File "/home/lirunze/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/lirunze/anaconda3/lib/python3.8/site-packages/apex-0.1-py3.8.egg/apex/parallel/distributed.py", line 560, in forward
result = self.module(*inputs, **kwargs)
File "/home/lirunze/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/lirunze/anaconda3/lib/python3.8/site-packages/apex-0.1-py3.8.egg/apex/amp/_initialize.py", line 196, in new_fwd
output = old_fwd(*applier(args, input_caster),
File "/home/lirunze/xh/project/git/trans-fg_-i2-t/models/modeling.py", line 305, in forward
part_logits = self.part_head(part_tokens[:, 0])
File "/home/lirunze/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/lirunze/anaconda3/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 94, in forward
return F.linear(input, self.weight, self.bias)
File "/home/lirunze/anaconda3/lib/python3.8/site-packages/torch/nn/functional.py", line 1753, in linear
return torch._C._nn.linear(input, weight, bias)
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DFALT_TENSOR_OP)`
Could you analyze the problem about this? Thank you!
The text was updated successfully, but these errors were encountered:
Thanks for your work and sharing your codes!
When I train on two gpus(1080TI *2), it is current.
the configuration is CUDA 11.1, pythorch 1.8.1, torchvision 0.9.1, python 3.8.3
Could you analyze the problem about this? Thank you!
The text was updated successfully, but these errors were encountered: