You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When use my custom dataset, which contains 6 classes, so I modified the data_utils.py, and change the 'num_classes = 6' in train.py. But I got these errors:
Training (X / X Steps) (loss=X.X): 0%|| 0/33 [00:00<?, ?it/s]/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [5,0,0] Assertion t >= 0 && t < n_classes failed.
Training (X / X Steps) (loss=X.X): 0%|| 0/33 [00:00<?, ?it/s]
Traceback (most recent call last):
File "train_trash.py", line 335, in
main()
File "train_trash.py", line 331, in main
train(args, model)
File "train_trash.py", line 211, in train
loss.backward()
File "/root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/tensor.py", line 185, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/autograd/init.py", line 127, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasCreate(handle)
Exception raised from createCublasHandle at /opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/ATen/cuda/CublasHandlePool.cpp:8 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x4d (0x7f533ff7077d in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: + 0xcfc185 (0x7f53410d2185 in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #2: at::cuda::getCurrentCUDABlasHandle() + 0xb75 (0x7f53410d3065 in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #3: + 0xcef217 (0x7f53410c5217 in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #4: at::native::(anonymous namespace)::addmm_out_cuda_impl(at::Tensor&, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar, c10::Scalar) + 0xf7e (0x7f534242985e in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #5: at::native::mm_cuda(at::Tensor const&, at::Tensor const&) + 0xb3 (0x7f534242b353 in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #6: + 0xd14ea0 (0x7f53410eaea0 in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #7: + 0x7b1990 (0x7f5372b9b990 in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #8: at::Tensor c10::Dispatcher::call<at::Tensor, at::Tensor const&, at::Tensor const&>(c10::TypedOperatorHandle<at::Tensor (at::Tensor const&, at::Tensor const&)> const&, at::Tensor const&, at::Tensor const&) const + 0xbc (0x7f5373383c7c in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #9: at::mm(at::Tensor const&, at::Tensor const&) + 0x4b (0x7f53732d4b0b in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #10: + 0x2c2be8f (0x7f5375015e8f in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #11: + 0x7b1990 (0x7f5372b9b990 in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #12: at::Tensor c10::Dispatcher::call<at::Tensor, at::Tensor const&, at::Tensor const&>(c10::TypedOperatorHandle<at::Tensor (at::Tensor const&, at::Tensor const&)> const&, at::Tensor const&, at::Tensor const&) const + 0xbc (0x7f5373383c7c in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #13: at::Tensor::mm(at::Tensor const&) const + 0x4b (0x7f537346a10b in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #14: + 0x2a6d094 (0x7f5374e57094 in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #15: torch::autograd::generated::AddmmBackward::apply(std::vector<at::Tensor, std::allocatorat::Tensor >&&) + 0x2d5 (0x7f5374e5d055 in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #16: + 0x30d1017 (0x7f53754bb017 in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #17: torch::autograd::Engine::evaluate_function(std::shared_ptrtorch::autograd::GraphTask&, torch::autograd::Node*, torch::autograd::InputBuffer&, std::shared_ptrtorch::autograd::ReadyQueue const&) + 0x1400 (0x7f53754b6860 in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #18: torch::autograd::Engine::thread_main(std::shared_ptrtorch::autograd::GraphTask const&) + 0x451 (0x7f53754b7401 in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #19: torch::autograd::Engine::thread_init(int, std::shared_ptrtorch::autograd::ReadyQueue const&, bool) + 0x89 (0x7f53754af579 in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #20: torch::autograd::python::PythonEngine::thread_init(int, std::shared_ptrtorch::autograd::ReadyQueue const&, bool) + 0x4a (0x7f53797de13a in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #21: + 0xc819d (0x7f537c30f19d in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/../../../.././libstdc++.so.6)
frame #22: + 0x76db (0x7f53a0e6c6db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #23: clone + 0x3f (0x7f53a01e8a3f in /lib/x86_64-linux-gnu/libc.so.6)
I guess this error is caused by the labels crossing the boundary, but I can't find where to modify it. Could you please help me fix this problem?
Thank you!
The text was updated successfully, but these errors were encountered:
When use my custom dataset, which contains 6 classes, so I modified the data_utils.py, and change the 'num_classes = 6' in train.py. But I got these errors:
Training (X / X Steps) (loss=X.X): 0%|| 0/33 [00:00<?, ?it/s]/opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [5,0,0] Assertion
t >= 0 && t < n_classes
failed.Training (X / X Steps) (loss=X.X): 0%|| 0/33 [00:00<?, ?it/s]
Traceback (most recent call last):
File "train_trash.py", line 335, in
main()
File "train_trash.py", line 331, in main
train(args, model)
File "train_trash.py", line 211, in train
loss.backward()
File "/root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/tensor.py", line 185, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/autograd/init.py", line 127, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling
cublasCreate(handle)
Exception raised from createCublasHandle at /opt/conda/conda-bld/pytorch_1595629416375/work/aten/src/ATen/cuda/CublasHandlePool.cpp:8 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x4d (0x7f533ff7077d in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: + 0xcfc185 (0x7f53410d2185 in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #2: at::cuda::getCurrentCUDABlasHandle() + 0xb75 (0x7f53410d3065 in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #3: + 0xcef217 (0x7f53410c5217 in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #4: at::native::(anonymous namespace)::addmm_out_cuda_impl(at::Tensor&, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar, c10::Scalar) + 0xf7e (0x7f534242985e in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #5: at::native::mm_cuda(at::Tensor const&, at::Tensor const&) + 0xb3 (0x7f534242b353 in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #6: + 0xd14ea0 (0x7f53410eaea0 in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #7: + 0x7b1990 (0x7f5372b9b990 in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #8: at::Tensor c10::Dispatcher::call<at::Tensor, at::Tensor const&, at::Tensor const&>(c10::TypedOperatorHandle<at::Tensor (at::Tensor const&, at::Tensor const&)> const&, at::Tensor const&, at::Tensor const&) const + 0xbc (0x7f5373383c7c in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #9: at::mm(at::Tensor const&, at::Tensor const&) + 0x4b (0x7f53732d4b0b in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #10: + 0x2c2be8f (0x7f5375015e8f in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #11: + 0x7b1990 (0x7f5372b9b990 in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #12: at::Tensor c10::Dispatcher::call<at::Tensor, at::Tensor const&, at::Tensor const&>(c10::TypedOperatorHandle<at::Tensor (at::Tensor const&, at::Tensor const&)> const&, at::Tensor const&, at::Tensor const&) const + 0xbc (0x7f5373383c7c in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #13: at::Tensor::mm(at::Tensor const&) const + 0x4b (0x7f537346a10b in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #14: + 0x2a6d094 (0x7f5374e57094 in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #15: torch::autograd::generated::AddmmBackward::apply(std::vector<at::Tensor, std::allocatorat::Tensor >&&) + 0x2d5 (0x7f5374e5d055 in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #16: + 0x30d1017 (0x7f53754bb017 in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #17: torch::autograd::Engine::evaluate_function(std::shared_ptrtorch::autograd::GraphTask&, torch::autograd::Node*, torch::autograd::InputBuffer&, std::shared_ptrtorch::autograd::ReadyQueue const&) + 0x1400 (0x7f53754b6860 in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #18: torch::autograd::Engine::thread_main(std::shared_ptrtorch::autograd::GraphTask const&) + 0x451 (0x7f53754b7401 in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #19: torch::autograd::Engine::thread_init(int, std::shared_ptrtorch::autograd::ReadyQueue const&, bool) + 0x89 (0x7f53754af579 in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #20: torch::autograd::python::PythonEngine::thread_init(int, std::shared_ptrtorch::autograd::ReadyQueue const&, bool) + 0x4a (0x7f53797de13a in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #21: + 0xc819d (0x7f537c30f19d in /root/anaconda3/envs/agr/lib/python3.6/site-packages/torch/lib/../../../.././libstdc++.so.6)
frame #22: + 0x76db (0x7f53a0e6c6db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #23: clone + 0x3f (0x7f53a01e8a3f in /lib/x86_64-linux-gnu/libc.so.6)
I guess this error is caused by the labels crossing the boundary, but I can't find where to modify it. Could you please help me fix this problem?
Thank you!
The text was updated successfully, but these errors were encountered: