[DLRM/Pytorch] Cuda error: illegal memory access after changing embedding size to 64

Related to **DLRM/Pytorch** 

**Describe the bug**
Changed embedding size to 64 (default 128)
Changed the last layer of bottom MLP size to 64 (default 128)
This caused crash as shown below.
```
Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/opt/conda/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/workspace/dlrm/dlrm/scripts/main.py", line 519, in <module>
    app.run(main)
  File "/opt/conda/lib/python3.6/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/opt/conda/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/workspace/dlrm/dlrm/scripts/main.py", line 264, in main
    train(model, loss_fn, optimizer, data_loader_train, data_loader_test, scaled_lr)
  File "/workspace/dlrm/dlrm/scripts/main.py", line 361, in train
    loss.backward()
  File "/opt/conda/lib/python3.6/site-packages/torch/tensor.py", line 184, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/opt/conda/lib/python3.6/site-packages/torch/autograd/__init__.py", line 123, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`
Exception raised from createCublasHandle at ../aten/src/ATen/cuda/CublasHandlePool.cpp:8 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x6b (0x7ff5f440a82b in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x327d0c2 (0x7ff4bbe1c0c2 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #2: at::cuda::getCurrentCUDABlasHandle() + 0xb82 (0x7ff4bbe1d9d2 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #3: <unknown function> + 0x326945f (0x7ff4bbe0845f in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #4: at::native::addmm_out_cuda_impl(at::Tensor&, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::Scalar, c10::Scalar) + 0x78e (0x7ff4bacef5ee in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #5: at::native::mm_cuda(at::Tensor const&, at::Tensor const&) + 0x15b (0x7ff4bacf04bb in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #6: <unknown function> + 0x3293808 (0x7ff4bbe32808 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #7: <unknown function> + 0x330f734 (0x7ff4bbeae734 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cuda.so)
frame #8: <unknown function> + 0x2ba029b (0x7ff537b0b29b in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #9: <unknown function> + 0x7a8224 (0x7ff535713224 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #10: at::Tensor c10::Dispatcher::call<at::Tensor, at::Tensor const&, at::Tensor const&>(c10::OperatorHandle const&, at::Tensor const&, at::Tensor const&) const + 0xc5 (0x7ff5c6f346e5 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #11: <unknown function> + 0x28fe447 (0x7ff537869447 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #12: torch::autograd::generated::AddmmBackward::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&) + 0x155 (0x7ff5378aeca5 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #13: <unknown function> + 0x2ee2f75 (0x7ff537e4df75 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #14: torch::autograd::Engine::evaluate_function(std::shared_ptr<torch::autograd::GraphTask>&, torch::autograd::Node*, torch::autograd::InputBuffer&, std::shared_ptr<torch::autograd::ReadyQueue> const&) + 0x1808 (0x7ff537e48f68 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #15: torch::autograd::Engine::thread_main(std::shared_ptr<torch::autograd::GraphTask> const&, bool) + 0x551 (0x7ff537e49e01 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #16: torch::autograd::Engine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&) + 0xa3 (0x7ff537e3f863 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
frame #17: torch::autograd::python::PythonEngine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&) + 0x50 (0x7ff5c7236b20 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #18: <unknown function> + 0xbd6df (0x7ff5f4af76df in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
frame #19: <unknown function> + 0x76db (0x7ff5fffcf6db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #20: clone + 0x3f (0x7ff5ffcf888f in /lib/x86_64-linux-gnu/libc.so.6)

terminate called after throwing an instance of 'c10::Error'
  what():  CUDA error: an illegal memory access was encountered
Exception raised from create_event_internal at ../c10/cuda/CUDACachingAllocator.cpp:687 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x6b (0x7ff5f440a82b in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0xc10 (0x7ff5f41a5500 in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7ff5f43f2c9d in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #3: <unknown function> + 0x59f1e2 (0x7ff5c724b1e2 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
<omitting python frames>
frame #16: __libc_start_main + 0xe7 (0x7ff5ffbf8b97 in /lib/x86_64-linux-gnu/libc.so.6)

Fatal Python error: Aborted

Thread 0x00007ff59fda0700 (most recent call first):

Thread 0x00007ff56b58b700 (most recent call first):

Current thread 0x00007ff6003fc740 (most recent call first):
Aborted
```

**To Reproduce**
use the command line:
--embedding_dim 64 --bottom_mlp_sizes 512,256,64

**Expected behavior**
it should not crash.

**Environment**
Please provide at least:
* Container version (e.g. pytorch:20.06-py3):
* GPUs in the system: (e.g. 1x Tesla V100 32GB):
* CUDA driver version (e.g. 418.67):


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DLRM/Pytorch] Cuda error: illegal memory access after changing embedding size to 64 #830

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[DLRM/Pytorch] Cuda error: illegal memory access after changing embedding size to 64 #830

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions